Building Trust Through Open Innovation

Oct 23, 2025

As part of our commitment to open innovation, AMD actively supports an open methodology and ecosystem that fosters transparency, safety, interoperability and access for all.

AMD Open Innovation strategy enables AI safety and the principles of responsible development.

Transparency & Explainability: By releasing full model details, AMD supports the development of explainable AI systems, aligning with global regulatory trends.
Interoperability & Collaboration: AMD collaboration with companies like Zyphra, HCLTech and other innovators promotes open ecosystems, reducing vendor lock-in and enabling safer, more flexible AI deployments.
Open and Scalable: AMD open-source ROCm™ software stack allows developers to build AI applications across Cloud to Client, fostering innovation and reducing risks associated with proprietary systems.

AMD is making significant strides in the realm of open-source AI models and model safety by embracing transparency, collaboration, and responsible AI development.

AMD Open-Source AI Models

AMD has launched a suite of open-source Generative AI (GenAI) models trained on AMD Instinct™ GPUs using the ROCm™ software stack. These models span various domains:

Instella-Long: A 3B-parameter language model with a 128K context length, optimized for long-context AI research.
Instella-Math: A reasoning-focused model trained with chain-of-thought reinforcement learning.
Instella-T2I & Hummingbird-T2V: Text-to-image and text-to-video generation models with efficient training and high-quality output.
Nitro-T & Hummingbird-I2V: Image and image-to-video generation models designed for fast training and resource efficiency.
AMD OLMo & AMD-135M: Instruction-following and small language models with speculative decoding for efficient inference.

All models come with open weights, training configurations, datasets, and code, enabling full reproducibility and community-driven innovation.

AMD Hybrid Models

Zebra-Llama is a family of hybrid Large Language Models (LLMs) proposed by AMD that composes Multi-head Latent Attention (MLA) and Mamba2 for KV cache compression and computational efficiency.

This combination achieves Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.

This model, `Zebra-Llama-1B-4MLA-12M2-DPO`, is created by efficiently adapting the pre-trained `Llama-3.2-1B-Instruct` model conducted post-training on AMD Instinct MI300X GPUs. This training approach bypasses the need for costly pre-training from scratch.

Family of hybrid language models (1B, 3B, and 8B) are directly composed from existing pre-trained Transformers without full retraining. Together, these building blocks, along with our initialization and distillation strategy, allow AMD-HybridLM to dramatically reduce memory usage and inference cost, without sacrificing safety and performance.

Our goal is to design a hybrid model that’s more efficient without compromising on safety and performance, ideal for real-world applications where safety, speed and intelligence matter. Microsoft's ToxiGen is a large-scale, machine-generated dataset and benchmarking tool designed to improve the detection of adversarial and implicit hate speech, including the kinds of toxicity that often evade basic content moderation systems. Microsoft ToxiGen is run on AMD Zebra family of hybrid large language models to balance performance with safety

AMD Open Model Development Methodology

As open-source AI models become more powerful and widely adopted, ensuring their safe and responsible use has never been more important. That’s why we’re sharing our methodology of model alignment and the safety benchmarks we run as part of our model release process. By making our evaluation approach transparent, we aim to build trust with developers, researchers, and users, while setting a foundation for continuous improvement.

When releasing models, AMD focuses on the following safety dimensions:

Harmful content generation – testing for toxicity, bias, and misuse in disallowed contexts.
Jailbreak resilience – measuring how well models withstand adversarial prompts designed to bypass safeguards.
Hallucination detection – monitoring factual accuracy and reducing fabricated outputs.

We’re also exploring broader dimensions such as fairness, robustness, and sustainability as part of our long-term safety framework.

Model Alignment during Training

Our models adopt a multi-stage alignment pipeline aimed at making their output safer, fairer, and more truthful. After pretraining on a large corpus (trillions of tokens), the model goes through supervised fine-tuning (SFT) on high-quality instruction datasets (e.g. TuluV2, OpenHermes-2.5, WebInstructSub, Code-Feedback) and then a preference alignment stage using Direct Preference Optimization (DPO) on human preference data (UltraFeedback). In this pipeline, the role of DPO is to push the model toward outputs that a reasonable human would judge as helpful, safe, and accurate.

During and after alignment, the models are evaluated on responsible AI benchmarks such as Microsoft ToxiGen (for toxicity/adversarial hate speech, a lower score is better for safety), NYU Crows-Pairs (for social bias across sensitive categories, a higher score is better for bias), and Cornell/OpenAI TruthfulQA (for resisting falsehoods and misconception mimicry, a higher score is better for truthfulness). These metrics help signal whether the alignment steps succeeded in reducing harmful or biased behavior while preserving or improving factual accuracy. These metrics help signal whether the alignment steps succeeded in reducing harmful or biased behavior while preserving or improving factual accuracy.

Model Evaluation and Benchmarks

Our evaluations combine structured datasets, adversarial testing, and red teaming. We use standardized test suites where possible and supplement them with scenarios relevant to real-world use. The goal isn’t just to pass tests, it’s to uncover weak spots early and address them before models are widely deployed.

Metrics we track include safety thresholds for harmful outputs, jailbreak success rates, and hallucination precision/recall. These benchmarks guide how we improve both the models and the safeguards around them.

Microsoft ToxiGen: For Adversarial and Implicit Hate Speech Detection.
NYU Crows Pair: A common dataset for evaluating bias in LLMs. It contains 1,508 instance pairs in nine categories: Race/Color, Gender, Socioeconomic Status, Nationality, Religion, Age, Sexual Orientation, Physical Appearance, and Disability.
Cornell/OpenAI TruthfulQA(mc2): Measuring How Models Mimic Human Falsehoods.

Responsible AI Benchmarks	TinyLlama-1.1B-Chat-v1.0 (1.1B)	MobiLlama-1B-Chat (1.2B)	OpenELM-1_1B-Instruct (1.1B)	AMD-OLMo-1B-SFT (1.2B)	AMD-OLMo-1B-SFT-DPO (1.2B)
Microsoft ToxiGen	41.70	37.23	42.34	39.04	39.68
NYU crows_pairs	60.35	58.50	59.93	60.29	61.00
Cornell/OpenAI TruthfulQA(mc2)	37.92	38.46	45.84	37.45	40.06

What We’ve Learned So Far

The results show strong performance in many areas, particularly in handling harmful content responsibly. At the same time, there are clear opportunities for improvement, including reducing rare, but impactful, jailbreak cases and improving reliability in factual grounding.

We believe that sharing both strengths and limitations openly is critical. Transparency doesn’t mean perfection; it means accountability while reinforcing trust and demonstrating a roadmap for progress. By publishing our benchmarks, we invite scrutiny and collaboration from the community. Developers and researchers get a clearer picture of model behavior which supports safer adoption and more informed innovation

What’s Next for AI Safety in AMD Open Models

We’re expanding our evaluation suite to include broader domains of risk, such as fairness and robustness under stress. We’re also working to integrate safety frameworks directly into training With every model release, developer using our models knows exactly what safeguards they’re working with. As developers, we continually update our safety processes for these open-source AI models to reflect the latest research methodologies and risk frameworks, ensuring that our safeguards evolve alongside the rapidly advancing AI landscape.

We see this as the start of an ongoing conversation. Safety in open-source AI is a shared responsibility, and we welcome feedback, contributions, and red-team collaboration from the community. We are actively building a community around AMD Open and Trust-worthy AI efforts.

Join the AMD Community

You can review our models, benchmarks, share ideas, or contribute to improving the process on AMD-AGI · GitHub
Join our Trust and Safety channel on AMD Developer Community Discord

Together, we can drive Open and Trustworthy AI.