AMD Silo AI's Continued Pretraining Approach Provides a Blueprint for Global Language Adaptation

Jun 18, 2025

Silo AI Continued Pretraining for Global Language Adaptation

AMD Silo AI, in collaboration with TurkuNLP, have released high-performing open weights Finnish language models, Poro 2, alongside comprehensive documentation enabling organizations worldwide to adapt powerful language models for their native languages using AMD Instinct^TM GPU hardware.

The initiative delivers two Finnish language models (70B and 8B parameters) with AMD internal testing seeing dramatically improved performance through continued pretraining of strong English language models. More significantly, detailed step-by-step instructions, data generation techniques, recipes, and code that will allow developers to replicate this approach for any language are released.

"These model releases represent an accessible and efficient route to customizing language models for languages underserved by existing open-source solutions. This enables upgrading of, for instance, vital public sector services by leveraging the personalization and efficiency potential of AI, regardless of language." said Peter Sarlin, CEO and Co-founder of AMD Silo AI. "By employing powerful English foundation models and applying our continued pretraining methodology, we believe organizations can achieve equal or better capability in their target language while using less compute, compared to when training from scratch."

Continued Pretraining

Poro 2 was created using continued pretraining. Continued Pretraining (CPT) is an attractive option for adding new capabilities–like support for a new language–to existing models, for a fraction of the compute compared to training a model from scratch. Done correctly, strong English language capabilities can contribute cross-lingually to the performance of another target language, using a relatively small amount of data in the target language, and without disrupting the model’s existing capabilities. Poro 2 is based on the Llama 3.1 8B and 70B base models and show good performance in Finnish.

The released materials include:

Two production-ready Finnish language models (70B and 8B parameters)
Complete source code for model adaptation
Detailed data preparation workflows and accompanying post-training datasets
Comprehensive technical documentation

Language diversity remains a significant challenge in AI development, with many regions lacking robust language models in their native tongues. The continued pretraining approach demonstrated by AMD Silo AI, accomplished while testing with the AMD Instinct MI250X on the LUMI supercomputer, provides an open-source blueprint for countries to develop domestic AI capabilities that support linguistic sovereignty while lowering the barriers to entry for local AI innovation.

This release includes two models (8B and 70B) that have been post-trained as instruction-following AI assistants in Finnish and English. Additionally, the base models from the pretraining process, as well as intermediate post-training checkpoints are released.

Models and data sets are available at https://huggingface.co/collections/LumiOpen/poro-2-6835bec8186e98712b061f02. For a more technical deep dive, data generation, recipes and code are available on AMD ROCm™ blogs.

AMD Silo AI: Pushes the Frontier of AI on AMD Compute Platforms. AMD Silo AI is a leading AI lab that helps customers develop and deploy advanced AI models and solutions optimized for leadership compute platforms. As a global AI center of excellence with over 300 AI scientists and 125+ PhDs, they combine deep scientific knowledge with practical technology understanding to help organizations integrate, deploy and scale AI effectively. Our work spans from groundbreaking research to enterprise-ready AI solutions.

Article By

AMD Silo AI

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Sistemas de negócios

Pessoais e para gamers

Embedded

Recursos

Aceleradores de GPU

Aceleradores adaptativos

Aceleradores DPU

Adaptadores de ethernet

Estações de trabalho

Desktops

Notebooks

Recursos

FPGAs e SoCs adaptativos

Sistemas em módulos (SOMs)

Tecnologias

Recursos do desenvolvedor

Kits e Placas de avaliação

Ferramentas para processador

Ferramentas para placas de vídeo e Apps

FPGA e Ferramentas SoC adaptativas

Propriedade intelectual e Apps

Ferramentas de acelerador de GPU e Apps

Visão Geral

Comunicados à imprensa

Para borda e endpoints

Para desenvolvedores

Setores

Setores

Setores

Setores

Industrias

Cargas de trabalho

Jogos

Sistemas

Tecnologias

Recursos

Processadores EPYC

Placas de vídeo Radeon e Chipsets AMD

FPGA e SoCs adaptativos

Aceleradores Alveo e SOMs Kria

Processadores Ryzen

Adaptadores de ethernet

Visão Geral

Processadores

Aceleradores

SOMs, FPGAs e SoCs, adaptativos

Placas de vídeo

Visão Geral

Recursos por produto

Recursos por tipo

Sobre os nossos parceiros

Suporte global da AMD

Processadores e Placas de vídeo

Aceleradores

FPGA e SoCs adaptativos

Jogos e computação pessoal

Computação incorporada e adaptativa

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

AMD Silo AI's Continued Pretraining Approach Provides a Blueprint for Global Language Adaptation

AMD Silo AI, in collaboration with TurkuNLP, have released high-performing open weights Finnish language models, Poro 2, alongside comprehensive documentation enabling organizations worldwide to adapt powerful language models for their native languages using AMD InstinctTM GPU hardware.

Article By

Related Blogs

AMD Silo AI, in collaboration with TurkuNLP, have released high-performing open weights Finnish language models, Poro 2, alongside comprehensive documentation enabling organizations worldwide to adapt powerful language models for their native languages using AMD Instinct^TM GPU hardware.