The Most Advanced AMD AI Software Stack

Latest Algorithms and Models

Enhanced reasoning, attention algorithms, and sparse MoE for improved efficiency

AMD Instinct™ MI350 Series Support

AMD CDNA 4 architecture, supporting new datatypes with advanced HBM

Advanced Features for Scaling AI

Seamless distributed inference, MoE training, reinforcement learning at scale

AI Lifecycle

Simplified Enterprise AI and Cluster Management for scalability across diverse industries

AMD Ryzen™ AI & AMD Radeon™ Graphics Support

Comprehensive endpoint AI solution for versatile application needs

Generational Leap in Performance

ROCm 7 vs. ROCm 6

3.5x Average Performance Improvement
3.2
3.4x
3.8x
Llama 3.1 70B
Qwen2-72B
DeepSeek R1
Inference¹
3x Average Performance Improvement
3x
3x
3.1x
Llama 2 70B
Llama 3.1 8B
Qwen1.5 7B
Training²

AMD Instinct™ MI350 Series Support

Powering AMD Instinct™ MI350 Series GPUs

Enhancing seamless integration of AMD Instinct MI350X platforms with open rack infrastructure, enabling rapid deployment and optimized AI performance at scale.

AMD Instinct™ MI350X Accelerators

Scaling Enterprise AI

data center

Distributed Inference with Open Ecosystem

With vLLM-d, DeepEP, SGLang, and GPU direct access, ROCm software platform enables the highest throughput serving at rack scale —across batch, across nodes, across models.

woman in data center

ROCm for AI Lifecycle

ROCm software integrates with enterprise AI frameworks to provide a fully open-sourced end-to-end workflow for production AI, encompassing ROCm Enterprise AI including operations platform and Cluster Management.

vertical-gradient-1.png

AI at the Endpoint

Expanding ROCm Ecosystem Across AMD Ryzen™ AI and AMD Radeon™ Graphics

ROCm endpoint AI ecosystem supports Linux and Windows on AMD Radeon products including the latest Radeon RX 9000 series, as well as the class leading Ryzen AI MAX products.

AMD Ryzen and Radeon powered desktop pc
curved gradient divider

Get Started Today

Accelerate your AI/ML, high-performance computing, and data analytics tasks with AMD developer cloud.

Stay Informed

Stay up to date with the latest ROCm news.

Footnotes
  1. MI300-080 -Testing  by AMD Performance Labs as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM  0.3.3  vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5   on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested.

    Hardware Configuration

    1P AMD EPYC™ 9534 CPU server with 8x AMD Instinct™ MI300X (192GB, 750W) GPUs, Supermicro AS-8125GS-TNMR2, NPS1 (1 NUMA per socket), 1.5 TiB (24 DIMMs, 4800 mts memory, 64 GiB/DIMM), 4x 3.49TB Micron 7450 storage, BIOS version: 1.8 

    Software Configuration(s)

    Ubuntu 22.04 LTS with Linux kernel 5.15.0-119-generic

    Qwen 72B and Llama 3.1-70B -

    ROCm 7.0 preview version SW 

    PyTorch 2.7.0. Deepseek R-1 - ROCm 7.0 preview version, SGLang 0.4.6, PyTorch 2.6.0 

    vs.

    Qwen 72 and Llama 3.1-70B - ROCm 6.x GA SW

    PyTorch 2.7.0 and 2.1.1, respectively,  

    Deepseek R-1: ROCm 6.x GA SW

    SGLang 0.4.1, PyTorch 2.5.0

    Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.

  2. MI300-081 - Testing conducted by AMD Performance Labs as of May 15, 2025, to measure the training performance (TFLOPS) of ROCm 7.0 preview version software, Megatron-LM, on (8) AMD Instinct MI300X GPUs running Llama 2-70B (4K), Qwen1.5-14B, and Llama3.1-8B models, and a custom docker container vs. a similarly configured system with AMD ROCm 6.0 software.

    Hardware Configuration

    1P AMD EPYC™ 9454 CPU, 8x AMD Instinct MI300X (192GB, 750W) GPUs, American Megatrends International LLC BIOS version: 1.8, BIOS 1.8.

    Software Configuration

    Ubuntu 22.04 LTS with Linux kernel 5.15.0-70-generic

    ROCm 7.0., Megatron-LM, PyTorch 2.7.0

    vs.

    ROCm 6.0 public release SW, Megatron-LM code branches hanl/disable_te_llama2 for Llama 2-7B, guihong_dev for LLama 2-70B, renwuli/disable_te_qwen1.5 for Qwen1.5-14B, PyTorch 2.2.

    Server manufacturers may vary configurations, yielding different results. Performance may vary based on configuration, software, vLLM version, and the use of the latest drivers and optimizations.