Unlocking On-Device ASR with Whisper on Ryzen AI NPUs

Sep 29, 2025

Silo AI Continued Pretraining for Global Language Adaptation

If you’ve used speech transcription or voice assistants, chances are you’ve relied on cloud-based speech recognition. It works — until you run out of API credits, lose internet, worry about privacy, or your CPU starts to lag.

What if you could run powerful speech recognition entirely on your device — efficiently, privately, and without cloud dependencies?

With the latest AMD Ryzen™ AI software, you can deploy Whisper models for real-time speech-to-text using Neural Processing Unit (NPU) acceleration. These models are part of Whisper, an open-source automatic speech recognition (ASR) and speech translation system developed by OpenAI. Whisper supports multilingual transcription and translation, converting spoken audio into text with high accuracy.

This blog is primarily intended for users with Ryzen AI 300 series PCs, but if you're using a standard CPU, you can still follow along and run Whisper locally.

Why Run Whisper on the NPU?

Running Whisper locally on the Neural Processing Unit (NPU) offers several compelling advantages:

Privacy First: Your audio stays on your device — no cloud uploads, no streaming, no risk of third-party eavesdropping.
Performance: The NPU runs inference using BFP16 precision, nearly as fast as INT8 but with higher accuracy. This means instant voice commands and real-time captions.
Power Efficiency: NPUs are purpose-built for AI workloads and consume significantly less power than CPUs or GPUs. This translates to better battery life and cooler, quieter devices.
Freeing up the CPU/GPU: Offloading automatic speech recognition (ASR) to the NPU frees up your CPU and GPU for other tasks—whether you're gaming, browsing, or compiling code.

Ready to Try It Yourself?

Before diving in, we recommend familiarizing yourself with the Ryzen AI documentation to understand the platform and its capabilities. For a full, detailed walkthrough on exporting, optimizing, and running Whisper models on Ryzen AI NPU—including example scripts, evaluation tools, and configuration files, check out the official RyzenAI-SW GitHub repository that hosts the ASR Demo.

👉 https://github.com/amd/RyzenAI-SW
👉 https://github.com/amd/RyzenAI-SW/tree/main/demo/ASR/Whisper

This repo contains everything you need to get started quickly, including step-by-step instructions, pre-built demos, and performance benchmarks.

We use the Hugging Face Optimum toolkit to export Whisper models optimized for the Ryzen AI NPU. Follow the instructions here to export using HF optimum CLI and set up the model for NPU execution.

Support Details

Whisper base, small, and medium (multilingual versions) are currently supported. Whisper large exceeds the practical limits of current NPU hardware and is not supported at this time.

For optimal performance, the NPU prefers static input shapes. Unlike dynamic shapes, which are harder to optimize and can introduce latency, static shapes allow the NPU to run faster and more efficiently.

Live Transcription: Use shorter static sequence lengths to minimize delay and improve responsiveness.
Longer Audio: For offline transcription, set the sequence length up to 448 tokens for better throughput.

Tuning the sequence length to match your use case—whether real-time or offline—helps you get the most out of Whisper on the NPU.

The first time you target the NPU, Whisper’s encoder and decoder models undergo compilation. This process applies all necessary optimizations—including kernel fusion—and stores the results in a cache location specified via provider options. Initial compilation can take 5 to 15 minutes per model, depending on model size, but it only happens once. After that, inference runs instantly using the compiled and cached version.

Evaluating Whisper on Ryzen AI NPU

Evaluating Whisper on the NPU is essential because:

The NPU uses Block Floating Point 16 (BFP16) precision and custom kernels, which can affect speed and accuracy. For more information on quantizing to BFP16, see the AMD Quark Guide.
Performance on NPU differs significantly from CPU inference.

How we evaluate:

Use Word Error Rate (WER) for English using LibriSpeech test-clean dataset.
Use Character Error Rate (CER) for Chinese datasets like Aishell1 dataset.

These metrics compare model outputs against true transcripts to measure accuracy.

Performance Highlights: Ryzen AI NPU vs CPU

We tested Whisper base, small, and medium models running on the Ryzen AI NPU (without KV caching). Here’s how they compare to CPU-only runs for 30s audio transcriptions.

Model	Device	Real-Time Factor (RTF)
Whisper Base	NPU	0.35
Whisper Base	CPU	0.7
Whisper Small	NPU	1.2
Whisper Small	CPU	2.2

_{Table 1: Whisper Model Performance on Ryzen AI NPU vs CPU for 30s audio}

Lower RTF means faster than real time (e.g., 0.35 means processing ~3x faster than audio length). Also note, these models used do not use KV caching. Future releases will focus on improved performance with KV caching enabled.

Test Configuration for the results in Table 1:

Processor: AMD Ryzen™ AI 9 HX 370 (12 cores, max clock 2000 MHz) with Integrated Radeon™ 890M
Memory: 32 GB RAM
Software: Ryzen AI 1.5.0
NPU MCDM driver: 32.0.203.280, Date: 5/16/2025
Test Date: 09/20/2025
OS: Windows 11

Conclusion

Running Whisper on the Ryzen AI’s NPU unlocks fast, private, and power-efficient speech recognition on-device. With Hugging Face Optimum exports and static input shapes, you can get Whisper models running locally with ease. Performance beats CPU-only setups by a wide margin, making real-time ASR practical on portable and desktop devices alike.

Ryzen AI 1.5 is the first release that supports ASR models on NPU. Stay tuned for future versions that provide performance improvements and better feature support like KV caching.

Try it out on GitHub and experience local Whisper for yourself!

Article By

Iswarya Alex

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Unlocking On-Device ASR with Whisper on Ryzen AI NPUs

Why Run Whisper on the NPU?

Ready to Try It Yourself?

Support Details

Evaluating Whisper on Ryzen AI NPU

Performance Highlights: Ryzen AI NPU vs CPU

Conclusion

Article By

Related Blogs