
The Most Advanced AMD AI Software Stack
Latest Algorithms and Models
Enhanced reasoning, attention algorithms, and sparse MoE for improved efficiency
AMD Instinct™ MI350 Series Support
AMD CDNA 4 architecture, supporting new datatypes with advanced HBM
Advanced Features for Scaling AI
Seamless distributed inference, MoE training, reinforcement learning at scale
AI Lifecycle
Simplified Enterprise AI and Cluster Management for scalability across diverse industries
AMD Ryzen™ AI & AMD Radeon™ Graphics Support
Comprehensive endpoint AI solution for versatile application needs
Generational Leap in Performance
ROCm 7 vs. ROCm 6
Inference¹
Training²
AMD Instinct™ MI350 Series Support
Powering AMD Instinct™ MI350 Series GPUs
Enhancing seamless integration of AMD Instinct MI350X platforms with open rack infrastructure, enabling rapid deployment and optimized AI performance at scale.

Scaling Enterprise AI

Distributed Inference with Open Ecosystem
With vLLM-d, DeepEP, SGLang, and GPU direct access, ROCm software platform enables the highest throughput serving at rack scale —across batch, across nodes, across models.

ROCm for AI Lifecycle
ROCm software integrates with enterprise AI frameworks to provide a fully open-sourced end-to-end workflow for production AI, encompassing ROCm Enterprise AI including operations platform and Cluster Management.

AI at the Endpoint
Expanding ROCm Ecosystem Across AMD Ryzen™ AI and AMD Radeon™ Graphics
ROCm endpoint AI ecosystem supports Linux and Windows on AMD Radeon products including the latest Radeon RX 9000 series, as well as the class leading Ryzen AI MAX products.


Get Started Today
Accelerate your AI/ML, high-performance computing, and data analytics tasks with AMD developer cloud.
Stay Informed
Stay up to date with the latest ROCm news.
Footnotes
- MI300-080 -Testing by AMD Performance Labs as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM 0.3.3 vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5 on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested.
Hardware Configuration1P AMD EPYC™ 9534 CPU server with 8x AMD Instinct™ MI300X (192GB, 750W) GPUs, Supermicro AS-8125GS-TNMR2, NPS1 (1 NUMA per socket), 1.5 TiB (24 DIMMs, 4800 mts memory, 64 GiB/DIMM), 4x 3.49TB Micron 7450 storage, BIOS version: 1.8
Software Configuration(s)
Ubuntu 22.04 LTS with Linux kernel 5.15.0-119-generic
Qwen 72B and Llama 3.1-70B -
ROCm 7.0 preview version SW
PyTorch 2.7.0. Deepseek R-1 - ROCm 7.0 preview version, SGLang 0.4.6, PyTorch 2.6.0
vs.
Qwen 72 and Llama 3.1-70B - ROCm 6.x GA SW
PyTorch 2.7.0 and 2.1.1, respectively,
Deepseek R-1: ROCm 6.x GA SW
SGLang 0.4.1, PyTorch 2.5.0
Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.
- MI300-081 - Testing conducted by AMD Performance Labs as of May 15, 2025, to measure the training performance (TFLOPS) of ROCm 7.0 preview version software, Megatron-LM, on (8) AMD Instinct MI300X GPUs running Llama 2-70B (4K), Qwen1.5-14B, and Llama3.1-8B models, and a custom docker container vs. a similarly configured system with AMD ROCm 6.0 software.
Hardware Configuration
1P AMD EPYC™ 9454 CPU, 8x AMD Instinct MI300X (192GB, 750W) GPUs, American Megatrends International LLC BIOS version: 1.8, BIOS 1.8.
Software Configuration
Ubuntu 22.04 LTS with Linux kernel 5.15.0-70-generic
ROCm 7.0., Megatron-LM, PyTorch 2.7.0
vs.
ROCm 6.0 public release SW, Megatron-LM code branches hanl/disable_te_llama2 for Llama 2-7B, guihong_dev for LLama 2-70B, renwuli/disable_te_qwen1.5 for Qwen1.5-14B, PyTorch 2.2.
Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.
- MI300-080 -Testing by AMD Performance Labs as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM 0.3.3 vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5 on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested.
Hardware Configuration1P AMD EPYC™ 9534 CPU server with 8x AMD Instinct™ MI300X (192GB, 750W) GPUs, Supermicro AS-8125GS-TNMR2, NPS1 (1 NUMA per socket), 1.5 TiB (24 DIMMs, 4800 mts memory, 64 GiB/DIMM), 4x 3.49TB Micron 7450 storage, BIOS version: 1.8
Software Configuration(s)
Ubuntu 22.04 LTS with Linux kernel 5.15.0-119-generic
Qwen 72B and Llama 3.1-70B -
ROCm 7.0 preview version SW
PyTorch 2.7.0. Deepseek R-1 - ROCm 7.0 preview version, SGLang 0.4.6, PyTorch 2.6.0
vs.
Qwen 72 and Llama 3.1-70B - ROCm 6.x GA SW
PyTorch 2.7.0 and 2.1.1, respectively,
Deepseek R-1: ROCm 6.x GA SW
SGLang 0.4.1, PyTorch 2.5.0
Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.
- MI300-081 - Testing conducted by AMD Performance Labs as of May 15, 2025, to measure the training performance (TFLOPS) of ROCm 7.0 preview version software, Megatron-LM, on (8) AMD Instinct MI300X GPUs running Llama 2-70B (4K), Qwen1.5-14B, and Llama3.1-8B models, and a custom docker container vs. a similarly configured system with AMD ROCm 6.0 software.
Hardware Configuration
1P AMD EPYC™ 9454 CPU, 8x AMD Instinct MI300X (192GB, 750W) GPUs, American Megatrends International LLC BIOS version: 1.8, BIOS 1.8.
Software Configuration
Ubuntu 22.04 LTS with Linux kernel 5.15.0-70-generic
ROCm 7.0., Megatron-LM, PyTorch 2.7.0
vs.
ROCm 6.0 public release SW, Megatron-LM code branches hanl/disable_te_llama2 for Llama 2-7B, guihong_dev for LLama 2-70B, renwuli/disable_te_qwen1.5 for Qwen1.5-14B, PyTorch 2.2.
Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.