AMD Ryzen™ AI Software 1.7 Release
Jan 26, 2026
Ryzen™ AI Software 1.7 release introduces several updates aimed at improving model coverage, reducing friction in local development workflows, and delivering more predictable performance on AMD Application Processing Units (NPU + iGPU). This release adds new architecture support, expands context length for LLMs, integrates Stable Diffusion into the unified Ryzen AI installer, and improves BF16 inference latency.
New Architectures: GPT‑OSS (MoE) and Gemma‑3 4B VLM
RAI 1.7 adds support for the Mixture-of-Experts (Moe) GPTOSS model and the Gemma3 4B vision-language model (VLM), expanding the set of NPU executable architectures available to developers.
- MoE efficiency: MoE models route tokens through expert networks, allowing developers to run larger capability models without paying the full compute cost of dense architectures. This can translate into better throughput and more responsive LLM pipelines locally.
- VLM capability: The inclusion of VLMs enables multimodal tasks such as image-grounded reasoning, captioning, lightweight visual search, or multimodal agent components.
- Broader experimentation: Developers can now benchmark and compare dense, MoE, and VLM architectures under the same NPU constraints, making it easier to choose models for production.
Stable Diffusion Integrated Into the Main Ryzen AI Installer
Stable Diffusion is now built directly into the primary Ryzen AI installer instead of requiring a separate environment.
- Predictable environment setup: Developers no longer need to manage SD‑specific Python environments, dependencies, or build steps.
- Unified toolchain: LLM, VLM, and SD workflows now live in a common environment, simplifying development for those building mixed-modality applications.
- Faster iteration: Faster setup means developers can quickly prototype text-to-image, image-to-image, or hybrid workflows without wrestling with environment fragmentation.
LLM Support for Up to 16K Context Length on Hybrid
Most LLMs in RAI 1.7 now support up to 16K tokens of context when running on the iGPU and NPU (hybrid).
- Long‑form reasoning: Developers can build applications involving longer documents, extended multi‑turn conversations, or workflows requiring persistent memory.
- Local RAG stacks: Longer context directly improves the effectiveness of on-device retrieval-augmented generation—reducing truncation and improving model grounding.
BF16 Pipeline With ~2x Lower Latency vs. RAI 1.6
The BF16 implementation in RAI 1.7 delivers significantly lower latency, approximately doubling throughput compared to RAI 1.6.
- Faster interactive LLMs: Lower token latency improves user perceived responsiveness, especially for chatstyle applications or agent loops.
- Better baseline for fine‑tuned models: BF16 improvements benefit both pretrained and custom fine‑tuned models, reducing time‑to‑first‑token and overall inference duration.
RAI 1.7 focuses on the things that smooth out day-to-day development: more model choices (MoE + VLM), a single installer that includes Stable Diffusion, longer LLM context windows on the NPU, and noticeably lower BF16 latency. The result is less friction in setup, quicker feedback loops when you test changes, and a more capable local stack for shipping LLM/VLM features.
For a detailed overview of the new features and enhancements in the 1.7 software release, check out the official release notes.
Subscribe to be notified of future Ryzen AI software updates and get the latest tools and resources to help you explore the limits of what's possible on AI PCs.
Additional Resources
- Ryzen AI Video Tutorials
- AMD Ryzen AI Developer Hub
- Ryzen AI 1.7 Release Notes
- Ryzen AI 1.7 Models
- Hybrid (iGPU + NPU): Ryzen AI 1.7 Hybrid Models on Hugging Face
- NPU-Only: Ryzen AI 1.7 NPU-Only Models on Hugging Face
Visit the Ryzen™ AI documentation to learn more about supported architectures, setup instructions, and how to start building with RAI 1.7.