Translation Isn't Localization: The AI Culture Challenge

Nov 19, 2025

a world map with human figures in different places

Digital sovereignty is the ability for a nation or region to make independent technology choices without external constraints. True sovereignty enables economic independence and resilience, strengthens national security and strategic autonomy, and allows nations to uphold cultural values in an AI-enabled, digital world.

In other words, digital sovereignty is not just a tech concern. Ultimately, technology solutions participate in shaping how nations preserve culture, identity, and autonomy. In this second of a series of blog posts, we examine the relationship between large language models (LLM) and culture. 

Language is a vessel of culture. It enables us to interact with each other and across generations, passing along acquired knowledge as well as customs. Language and culture shape each other in a dynamic and reciprocal relationship. 

Language Models Need Cultural Context, Not Just Translation

Today, large generative language models ("LLMs") are one of the most visible instantiations of AI. Since the popularization of interactive chatbots, companies and research institutions have been competing in building the best language models. How good a generative language model is, can be handily measured in terms of spelling and sentence structure, but assessing how well a language model hits the right tone of voice, and how well it balances e.g. helpfulness with conciseness is a trickier challenge. 

As LLMs are integrated at the core of products and services across regions, nuances that might seem minor from a technical standpoint will have a defining effect on the perceived cultural alignment, on acceptability and trustworthiness, and on the end-to-end usefulness of the service in question.

The critical phase for achieving LLMs that reflect specific cultural features is in post-training, which incorporates various techniques such as fine-tuning, instruction training, alignment, and other related efforts to coach the output to fit an envisioned use case. 

Currently, the most commonly used post-training datasets are entirely authored in English, and originate from fairly homogenous cultural areas. To achieve datasets for other languages, these instruction sets and attendant evaluation test suites are typically translated with the help of automated translation tools. Unfortunately, many translation tools today do not modify instruction datasets to accommodate for cultural patterns and sensitivities.  

The result is a set of translated example interactions that, while correct, convey the wrong tone of voice, the wrong attitude, or begin from non-local cultural starting points. An advisory system that incorporates enthusiastic and encouraging material into every response may seem natural in some languages but will seem obsequious and untrustworthy in others. 

Open Ecosystems: Roadmap for Sovereign and Culturally Sensitive AI

Technology is everywhere, intertwined with everyday life, enabling many of the comforts we take for granted. When language is used as an interface between people and technology, language becomes a proxy through which technology influences culture. This is why the world needs LLMs that are trained locally in various languages.

Training LLMs is one of the most compute-intensive activities, making it imperative for nations and public institutions to ensure the availability of needed AI infrastructure. Numerous initiatives for LLMs catering to smaller languages are underway across the globe, by both commercial and non-commercial actors, aiming to provide tools for essential services in local languages.  Effective post-training of these LLMs requires that the data used in training, as well as the methods of training are open, and that instruction training datasets are locally produced to align with local culture. Without afore mentioned openness, AI models are inherently “black boxes”, making it impossible to effectively instruction train a model for a language other than its original foundation language. 

A truly locally grounded model is built from the ground up with the participation of local actors - instruction training sets should be designed to fit local concerns with advice, consultation, and active guidance by native speakers and local stakeholders. This is a necessary step when going from a foundation model to establishing a post trained model with a trustworthy voice. Enabling this sort of participation from non-commercial partners will require open architectures and an open-source approach to the model development process. 

As mentioned in the previous blog post of this series, much like in the world of open science, open ecosystems for AI enable different actors to benefit from each other’s advances, while contributing to further progress, ultimately leading to AI sovereignty.

Share:

Article By


Principal Member of Technical Staff, Software Development

Related Blogs