AMD Ryzen AI Max+395: A Leap Forward in Generative AI Performance with Consumer PC
Jun 03, 2025

Key Takeaways
- AMD Ryzen™ AI MAX+395 with 128GB unified memory (up to 112 GB allocatable by the GPU) space provides unique capability of running generative AI workloads on a laptop form factor
- Powerful RDNA™ 3.5 based GPU technology with up to 40 compute units
- Up to 3.9x performance advantage over MacBook Pro 48GB with M4 Pro silicon running Stable Diffusion models[1]
- On device 70 billion parameter LLM models on AMD Ryzen AI MAX+395 with unified memory architecture
Brief Overview of AMD Ryzen AI Max+395 codenamed Strix Halo
AMD Ryzen AI Max+395 is a groundbreaking SoC (system on a chip) that sets new standards for an APU in memory capacity and generative AI capabilities. Ryzen AI Max+395 code named Strix Halo, is designed to excel in demanding GenAI workloads requiring large memory space and AI computational needs, making it an ideal choice for high-end client PCs demanding the best-in-class Generative AI experiences.

Figure 1: AMD Ryzen AI Max PRO Series Processors

Figure 2: AMD Ryzen AI Max PRO Series Processors with Unified Memory Architecture
Ryzen AI Max+395 boasts several key features and capabilities that distinguish it in the market. With its leadership memory capacity in its class, it ensures seamless multitasking and efficient handling of large datasets. Its advanced GPU capabilities with up to 40 RDNA™ 3.5 based compute units and optimized software stack provides significant performance boost and capabilities for generative AI and creator workloads. With unified memory capabilities, the data can be shared between different accelerators as shown in Fig2 without needing expensive memory copies and saving memory utilization for demanding generative AI workloads.
In today's rapidly evolving tech landscape, the importance of GenAI workloads in client PCs cannot be overstated. These workloads drive innovation across various PC client applications and experiences. AMD Ryzen AI Max+395 is optimized to meet these demands, providing users with the performance and reliability they need to stay ahead.
Comparative Analysis with Apple MacBook Pro
GenAI Performance Comparison
- Generative AI Performance Advantage: Ryzen AI Max+395 demonstrates a 3.9x performance advantage over the Apple MacBook Pro 16” equipped with M4 Pro silicon with 48GB unified memory running Stable Diffusion 3.5 image generation, attributed to software optimization, efficient memory management and leadership memory capacity.
- Concurrent Workloads: It achieves up to 2.6x faster token generation[2] and 3.3x faster image generation[3] when running concurrent GenAI workloads, thanks to its large memory pool and enhanced GPU capabilities.
Image Generation with Stable Diffusion 3.5
For image generation test, we used Stability AI’s state of the art text to image generation model Stable Diffusion3.5 and ran it on both AMD Ryzen Max+ 395 (Strix Halo) and Apple MacBook M4 Pro machines. Running this model on Strix Halo shows a 3.9x performance advantage over the MacBook M4 Pro 48GB system.
You can refer to this technical blog on how AMD worked with Stability AI to enhance generative AI workloads including Stable Diffusion 3X on all AMD GPU platforms.

Figure 3: Speedup vs Apple M4 Pro MacBook
On AMD Ryzen AI Max+395, we used Amuse, while the Apple M4 Pro (48GB) utilized Comfy UI running Stability AI Stable Diffusion 3.5 Large. The key software optimizations are based on fused attention nodes, unpacked attention nodes, and weight pruning. For more details on the software optimizations that contributed to the 3.9x performance benefit on AMD devices, please refer to the above Stability AI Blog and AMD Technical Blog.
Concurrent Stable Diffusion and Large Language Models (LLM)
Strix Halo achieves up to 2.6x faster token generation and 3.3x faster image generation when running concurrent GenAI workloads.

Figure 4: Speedup vs. Apple M4 Pro MacBook
During testing, the M4 Pro 48GB was observed to rely on swap memory, which significantly slowed its performance. When running Stable Diffusion 3.5 Large and the Phi4 14 billion parameter models concurrently, the AMD Ryzen™ AI Max+395 demonstrated a clear advantage due to its large memory pool, superior GPU capabilities, and optimized software stack.
Real-World Applications
Generative AI workloads are transforming various industries by enabling new applications and driving innovation. For instance, in the healthcare sector, AI models can assist in diagnosing diseases and personalizing treatment plans. In the entertainment industry, AI can generate realistic graphics and animations, enhancing user experience. The AMD Ryzen™ AI Max+395, with its superior performance in concurrent AI workloads, is well-suited for these and many other applications.
Configurations Tested
To evaluate the performance of AMD Ryzen AI Max+ and Apple MacBook Pro platforms, we conducted a series of tests using various configurations and environments. Below are the details of the setups used:
|
Apple M4 Pro |
AMD Ryzen™ AI Max+395 |
System |
Apple MacBook Pro 16-inch |
ASUS ROG Flow Z13 |
Operating System |
MacOS Sequoia 15.4.1 |
Microsoft Windows 11 24H2 (OS Build 26100.3775) |
GPU |
20-core Apple GPU |
AMD Radeon™ 8060S |
Graphics Driver |
15.4.1 |
32.0.21001.9024 |
Installed Memory |
48GB |
128GB |
Stable Diffusion Application |
ComfyUI 0.4.48 |
Amuse |
LLM Application |
Ollama |
ONNX-GenAI |
Workloads Tested
Stable Diffusion:
- Models: Stability AI Stable Diffusion 3.5 Large, 3.5 Medium, 3.0 Medium
- Image Size: 1024x1024
- Configuration: Classifier-Free Guidance (CFG) scale=4.5, steps=20
- For more details on the models, you can visit:
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main
Large Language Models (LLM):
- Phi4 14B
- Windows: https://huggingface.co/microsoft/phi-4-onnx
- macOS: https://ollama.com/library/phi4
- DeepSeek-R1-Distill-Llama-70B
Concurrent Models:
- Stable Diffusion 3.5 Large and Phi4 14B parameter model (referenced above)
Conclusion
In summary, the AMD Ryzen AI Max+395 APU sets new standards in memory capacity and generative AI performance. Its ability to handle demanding GenAI workloads with ease makes it a powerful platform for developers and researchers. Whether you need superior performance for image generation or large language models, Ryzen AI Max+395 is optimized to meet your needs.
References
- https://gpuopen.com/learn/accelerating_generative_ai_on_amd_radeon_gpus/
- https://stability.ai/news/stable-diffusion-now-optimized-for-amd-radeon-gpus
- https://stability.ai/news/stable-diffusion-now-optimized-for-amd-radeon-gpus
- https://www.amd.com/en/products/graphics/radeon-ai.html
- https://www.amd.com/en/blogs/2024/llm-on-amd-gpu-memory-footprint-and-performance-i.html

Related Blogs
Fußnoten
- Testing as of May 2025 using the ASUS ROG Flow Z13 compared to Apple MacBook Pro 16-inch laptop. Configuration for AMD Ryzen™ AI MAX+395: AMD reference board, Radeon™ 8060S graphics, 128GB RAM, Windows 11, diffusion framework. Configuration for Apple M4 Pro 16”/14 core CPU: Apple Macbook Pro 2024, 20 core GPU, 48GB RAM, macOS Sequoia (x64) Build 15.1.1, diffusion framework, ComfyUI 0.4.48. Application(s) tested: Stable Diffusion 3.5 (large), 3.0 (medium), and 3.5 (medium). "Iteration time" is defined as time to complete 1 transformer iteration. Laptop manufacturers may vary configurations yielding different results. SHO-35.
- Testing as of May 2025 using the ASUS ROG Flow Z13 compared to Apple MacBook Pro 16-inch laptop. Configuration for AMD Ryzen™ AI MAX+395: AMD reference board, Radeon™ 8060S graphics, 128GB RAM, Windows 11, LLM framework, ONNX-GenAI. Configuration for Apple M4 Pro 16”/14 core CPU: Apple Macbook Pro 2024, 20 core GPU, 48GB RAM, macOS Sequoia (x64) Build 15.1.1, LLM Framework Ollama. Applications tested: Tokens per second, DeepSeek R1- Llama 70B, Phi4, Concurrency Test: Stable Diffusion 3.5 Large and Phi4-14B. LLM Prompt: "Tell me a story about a dog", SD Prompt:"a Bengal Tiger running in the Lake Taho snow covered mountains and deep blue lake." Laptop manufacturers may vary configurations yielding different results. SHO-36.
- SHO-31: Testing as of April 2025 by AMD. All tests conducted using Amuse 3.0 RC and Adrenalin 24.30.31.05 Driver. Sustained performance average (elapsed time) of multiple runs using the specimen prompt: “Craft an image of a cozy fireplace with crackling flames, flickering candles, and a pile of cozy blankets nearby”. Models tested: Stable Diffusion 1.5, Stable Diffusion XL 1.0, SDXL Turbo, SD 3.0 Medium, Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo against Microsoft Olive conversion of the base model. ASUS ROG Flow Z13 equipped with an AMD Ryzen™ AI MAX+ 395 processor and 64GB of DDR5 8000 MT/s memory and Windows 11 Pro 24H2. Variable Graphics Memory set to 48GB. Performance may vary. SHO-31.
- Testing as of May 2025 using the ASUS ROG Flow Z13 compared to Apple MacBook Pro 16-inch laptop. Configuration for AMD Ryzen™ AI MAX+395: AMD reference board, Radeon™ 8060S graphics, 128GB RAM, Windows 11, diffusion framework. Configuration for Apple M4 Pro 16”/14 core CPU: Apple Macbook Pro 2024, 20 core GPU, 48GB RAM, macOS Sequoia (x64) Build 15.1.1, diffusion framework, ComfyUI 0.4.48. Application(s) tested: Stable Diffusion 3.5 (large), 3.0 (medium), and 3.5 (medium). "Iteration time" is defined as time to complete 1 transformer iteration. Laptop manufacturers may vary configurations yielding different results. SHO-35.
- Testing as of May 2025 using the ASUS ROG Flow Z13 compared to Apple MacBook Pro 16-inch laptop. Configuration for AMD Ryzen™ AI MAX+395: AMD reference board, Radeon™ 8060S graphics, 128GB RAM, Windows 11, LLM framework, ONNX-GenAI. Configuration for Apple M4 Pro 16”/14 core CPU: Apple Macbook Pro 2024, 20 core GPU, 48GB RAM, macOS Sequoia (x64) Build 15.1.1, LLM Framework Ollama. Applications tested: Tokens per second, DeepSeek R1- Llama 70B, Phi4, Concurrency Test: Stable Diffusion 3.5 Large and Phi4-14B. LLM Prompt: "Tell me a story about a dog", SD Prompt:"a Bengal Tiger running in the Lake Taho snow covered mountains and deep blue lake." Laptop manufacturers may vary configurations yielding different results. SHO-36.
- SHO-31: Testing as of April 2025 by AMD. All tests conducted using Amuse 3.0 RC and Adrenalin 24.30.31.05 Driver. Sustained performance average (elapsed time) of multiple runs using the specimen prompt: “Craft an image of a cozy fireplace with crackling flames, flickering candles, and a pile of cozy blankets nearby”. Models tested: Stable Diffusion 1.5, Stable Diffusion XL 1.0, SDXL Turbo, SD 3.0 Medium, Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo against Microsoft Olive conversion of the base model. ASUS ROG Flow Z13 equipped with an AMD Ryzen™ AI MAX+ 395 processor and 64GB of DDR5 8000 MT/s memory and Windows 11 Pro 24H2. Variable Graphics Memory set to 48GB. Performance may vary. SHO-31.