Performance Results with AMD ROCm™ Software

This page summarizes performance measurements on AMD Instinct™ GPUs running popular AI models.

The results found on this page highlight both Inference and Training benchmarks. The results are organized by the following:

AI Inference: vLLM, xDiT
AI Training: pyTorch, Megatron-LM, and JAX MaxText

The hardware platforms include Instinct MI355X/MI325X/MI300X GPUs, with benchmark insights provided for each framework where data is available.

The data in the following tables are a reference point to help users evaluate observed performance. It should not be considered as the peak performance that AMD GPUs and ROCm™ software can deliver.

AI Inference

vLLM
xDiT

vLLM

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210
Release date: December 11, 2025
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04, amdgpu driver 6.14.14

Throughput Measurements

The table below shows performance data where a local inference client is fed requests at an infinite rate and shows the throughput client-server scenario under maximum load.

Model	Precision	TP¹ Size	Input	Output	No. Prompts	Max. Seqs	Throughput²
Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	128	2048	3200	3200	13562.4
			128	4096	1500	1500	11800.9
			500	2000	2000	2000	11249.5
			2048	2048	1500	1500	7753.1
Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV)	FP8	8	128	2048	1500	1500	3822.8
			128	4096	1500	1500	3085.8
			500	2000	2000	2000	3059.9
			2048	2048	500	500	2192.3

_{¹TP stands for Tensor Parallelism.

²Throughput is measured in tokens/second}

Latency results

The table below shows latency measurement, which typically involves assessing the time from when the system receives an input to when the model produces a result.

Model	Precision	TP¹ Size	Batch Size	Input	Output	Latency²
Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	1	128	2048	16.015
			2	128	2048	18.683
			4	128	2048	19.245
			8	128	2048	20.468
			16	128	2048	22.137
			32	128	2048	25.571
			64	128	2048	32.987
			128	128	2048	46.426
			1	2048	2048	16.421
			2	2048	2048	19.035
			4	2048	2048	20.221
			8	2048	2048	21.483
			16	2048	2048	24.350
			32	2048	2048	29.776
			64	2048	2048	40.625
			128	2048	2048	63.671
Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	1	128	2048	48.618
			2	128	2048	50.980
			4	128	2048	52.760
			8	128	2048	55.864
			16	128	2048	58.795
			32	128	2048	69.482
			64	128	2048	89.384
			128	128	2048	122.601
			1	2048	2048	49.106
			2	2048	2048	51.664
			4	2048	2048	54.220
			8	2048	2048	58.904
			16	2048	2048	65.389
			32	2048	2048	83.387
			64	2048	2048	115.575
			128	2048	2048	177.779

_{¹TP stands for Tensor Parallelism.

²Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

Inference Performance with vLLM

Previous Versions

This table lists previous versions of the ROCm vLLM inference Docker image for inference performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Docker image tag	Components	Resources
rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210 (latest)	ROCm 7.0.0 vLLM 0.11.2 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm7.0.0_vllm_0.11.1_20251024	ROCm 7.0.0 vLLM 0.11.1 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm7.0.0_ vllm_ 0.10.2_ 20251006	ROCm 7.0.0 vLLM 0.10.2 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_ vllm_ 0.10.0_ 20250812	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.0.1_20250605	ROCm 6.4.1 vLLM 0.9.0.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_vllm_0.8.5_20250521	ROCm 6.3.1 0.8.5 vLLM (0.8.6.dev) PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_vllm_0.8.5_20250513	ROCm 6.3.1 vLLM 0.8.5 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_instinct_vllm0.8.3_20250415	ROCm 6.3.1 vLLM 0.8.3 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325	ROCm 6.3.1 vLLM 0.7.3 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6	ROCm 6.3.1 vLLM 0.6.6 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4	ROCm 6.2.1 vLLM 0.6.4 PyTorch 2.5.0	Documentation Docker Hub
rocm/vllm:rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50	ROCm 6.2.0 vLLM 0.4.3 PyTorch 2.4.0	Documentation Docker Hub

xDiT

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/pytorch-xdit:v25.12
Release date: Dec 8, 2025
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W) GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.3 LTS Host GPU driver ROCm 7.10.0_preview.

Models	Precision	Batch Size	Configuration	Latency¹
Hunyuan Video	BF16	1	720p, 129 Frames, 50 steps	86.74
Wan2.1	BF16	1	720p, 80 Frames, 40 steps	71.60
Wan2.2	BF16	1	720p, 80 Frames, 40 steps	66.69
Flux.1	BF16	1	1024x1240, 25 steps	0.94

_{¹ Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

xDiT Diffusion Inference on AMD GPUs User Guide

Results on the AMD Instinct™ MI300X platform

The following results are based on:

Docker container: rocm/pytorch-xdit:v25.12
Release date: Dec 8, 2025
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.10.0_preview.

Models	Precision	Batch Size	Configuration	Latency¹
Hunyuan Video	BF16	1	720p, 129 Frames, 50 steps	181.05
Wan2.1	BF16	1	720p, 80 Frames, 40 steps	151.25
Wan2.2	BF16	1	720p, 80 Frames, 40 steps	142.17
Flux.1	BF16	1	1024x1240, 25 steps	1.33

_{¹ Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

xDiT Diffusion Inference on AMD GPUs User Guide

Previous versions

This table lists previous versions of the Megatron-LM training Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Docker image tag	Components	Resources
rocm/pytorch-xdit:v25.12(latest)	ROCm 7.10.0 preview TheRock 3e3f834	Documentation Docker Hub
rocm/pytorch-xdit:v25.11(latest)	ROCm 7.10.0 preview TheRock 3e3f834	Documentation Docker Hub
rocm/pytorch-xdit:v25.10	ROCm 7.9.0 preview TheRock 7afbe45	Documentation Docker Hub

AI Training

The table below shows training performance data, where the AMD Instinct™ platform measures text generation training throughput with a unique sequence length and batch size. It focuses on Tokens per second per GPU.

PyTorch
Megatron-LM
JaxMaxText

PyTorch

Results on the AMD Instinct MI355X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W)GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.0.1.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	8	8192	FALSE	1	1	1	30,330
Llama 3.1 8B	1	BF16	6	8192	FALSE	1	1	1	22,037
Llama 3.1 70B	1	FP8	6	8192	TRUE	1	1	1	3,774
Llama 3.1 70B	8	FP8	10	8192	TRUE	1	1	1	3,727
Llama 3.1 70B	1	BF16	8	8192	TRUE	1	1	1	2,325
Llama 3.1 405B	8	FP8	2	8192	TRUE	1	1	1	625

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.

Model	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	FP8	7	8192	FALSE	1	1	1	15,750
Llama 3.1 8B	BF16	6	8192	FALSE	1	1	1	11,648
Llama 3.1 70B	FP8	5	8192	TRUE	1	1	1	1,794
Llama 3.1 70B	BF16	6	8192	TRUE	1	1	1	1,196

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2-120.

Model	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	FP8	5	8192	FALSE	1	1	1	12,666
Llama 3.1 8B	BF16	4	8192	FALSE	1	1	1	9,411
Llama 3.1 70B	FP8	3	8192	TRUE	1	1	1	1,386
Llama 3.1 70B	BF16	4	8192	TRUE	1	1	1	931

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Previous Versions

This table lists previous versions of the PyTorch training Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Image version	ROCm version	PyTorch version	Resources
v26.1(latest)	ROCm 7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch training documentation PyTorch training (legacy) documentation Docker Hub
v25.11	ROCm 7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub
v25.10	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub
V25.9	7.0.0	Primus 0.3.0 PyTorch 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub (gfx950) Docker Hub (gfx942)
v25.8	6.4.3	2.8.0a0+gitd06a406	Primus PyTorch Training documentation PyTorch training (legacy) documentation
v25.7	6.4.2	2.8.0a0+gitd06a406	Documentation Docker Hub
v25.6	6.3.4	2.8.0a0+git7d205b2	Documentation Docker Hub
v25.5	6.3.4	2.7.0a0+git637433	Documentation Docker Hub
v25.4	6.3.0	2.7.0a0+git637433	Documentation Docker Hub

Megatron-LM

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 64-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W）GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.0.1.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	4	8192	FALSE	1	1	1	-	32,888
Llama 3.1 8B	1	BF16	4	8192	FALSE	1	1	1	-	22,374
Llama 3.1 70B	1	BF16	4	8192	TRUE	1	1	1	-	2,133
Llama 3.3 70B	1	BF16	6	8192	TRUE	1	1	1	-	2,031
Mixtral 8x7B	1	BF16	4	4096	FALSE	1	1	1	8	13,803
Mixtral 8x22B	8	BF16	2	8192	FALSE	1	1	4	8	3,534
DeepSeekV2 Lite	1	BF16	12	4096	FALSE	1	1	1	8	39,786

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.
For multi-mode run, Server: Dual AMD EPYC 9575F 64-Core processor-based production server with 8x AMD Instinct MI325 (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 1.5, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2.60402-120~22.04

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	2	8192	FALSE	1	1	1	-	16,224
Llama 3.1 8B	8	FP8	2	8192	FALSE	1	1	1	-	16,186
Llama 3.1 8B	1	BF16	4	8192	FALSE	1	1	1	-	11,842
Llama 3.1 70B	1	BF16	4	8192	TRUE	1	1	1	-	1,135
Llama 3.1 70B	8	FP8	4	8192	TRUE	1	1	1	-	1,726
Llama 3.1 70B	8	BF16	1	8192	TRUE	1	1	1	-	1,174
Llama 3.3 70B	1	BF16	5	8192	TRUE	1	1	1	-	1,095
Mixtral 8x7B	1	BF16	4	4096	FALSE	1	1	1	8	7,046
DeepSeekV2 Lite	1	BF16	10	4096	FALSE	1	1	1	8	21,277

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2-120.
For multi-mode run, Server: Dual Intel Xeon Platinum 8480+ Processors with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 79007700 Ubuntu® 22.04, Host GPU driver ROCm 6.3.0-39.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	2	8192	FALSE	1	1	1	-	13,669
Llama 3.1 8B	1	BF16	2	8192	FALSE	1	1	1	-	9,563
Llama 3.1 70B	1	BF16	3	8192	TRUE	1	1	1	-	856
Llama 3.3 70B	1	BF16	2	8192	TRUE	1	1	1	-	843
Mixtral 8x7B	1	BF16	2	4096	FALSE	1	1	1	8	5,542
DeepSeekV2 Lite	1	BF16	4	4096	FALSE	1	1	1	8	17,403

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Previous Versions

Image version	ROCm version	PyTorch version	Resources
v26.1(latest)	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.11	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.10	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.9	7.0.0	Primus 0.3.0 PyTorch 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (gfx950) Docker Hub (gfx942)
v25.8	6.4.3	2.8.0a0+gitd06a406	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (py310)
v25.7	6.4.2	2.8.0a0+gitd06a406	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (py310)
v25.6	6.4.1	2.8.0a0+git7d205b2	Documentation Docker Hub (py312) Docker Hub (py310)
v25.5	6.3.4	2.8.0a0+gite2f9759	Documentation Docker Hub (py312) Docker Hub (py310)
v25.4	6.3.0	2.7.0a0+git637433	Documentation Docker Hub

JaxMaxText

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W）GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.1.1.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	9	8192	TRUE	1	1	1	1	20,588
Llama 3.1 8B	1	FP8	9	8192	TRUE	1	1	1	1	27,138
Llama 3.1 8B	8	BF16	9	8192	TRUE	1	1	1	1	19,513
Llama 3.1 70B	1	BF16	10	8192	TRUE	1	1	1	1	2,281
Llama 3.1 70B	1	FP8	10	8192	TRUE	1	1	1	1	3,763
Llama 3.1 70B	8	BF16	10	8192	TRUE	1	1	1	1	2,198
Llama 3.1 405B	8	FP8	4	8192	TRUE	1	1	1	1	652
Llama 3.3 70B	1	BF16	10	8192	TRUE	1	1	1	1	2,281
Mixtral 8x7B	1	BF16	12	4096	FALSE	1	1	1	8	11,584

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	4	8192	TRUE	1	1	1	1	9,953
Llama 3.1 8B	1	FP8	4	8192	TRUE	1	1	1	1	12,661
Llama 3.1 70B	1	BF16	7	8192	TRUE	1	1	1	1	1,127
Llama 3.1 70B	1	FP8	7	8192	TRUE	1	1	1	1	1,712
Llama 3.3 70B	1	BF16	7	8192	TRUE	1	1	1	1	1,127
Mixtral 8x7B	1	BF16	9	4096	FALSE	1	1	1	8	6,083

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.1.1.
For multi-mode run, Server: Dual AMD EPYC 9654 Processors with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 3.10 Ubuntu® 22.04, Host GPU driver ROCm 6.3.1-48.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	4	8192	TRUE	1	1	1	1	8,720
Llama 3.1 8B	1	FP8	4	8192	TRUE	1	1	1	1	11,138
Llama 3.1 70B	1	FP8	5	8192	TRUE	1	1	1	1	1,472
Llama 3.1 70B	1	BF16	5	8192	TRUE	1	1	1	1	963
Llama 3.3 70B	1	BF16	5	8192	TRUE	1	1	1	1	962
Mixtral 8x7B	1	BF16	12	4096	FALSE	1	1	1	8	5,382

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Previous Versions

The following results are based on:

This table lists previous versions of the ROCm JAX MaxText Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Image version	ROCm version	JAX version	Resources
v26.1 (latest)	7.1.1	0.8.2	Documentation Docker Hub
v25.11	7.1.0	0.7.1	Documentation Docker Hub
v25.9	7.0.01	0.6.2	Documentation Docker Hub
v25.7	6.4.1	0.6.0, 0.5.0	Documentation Docker Hub (JAX 0.6.0) Docker Hub (JAX 0.5.0)
v25.5	6.3.4	0.4.35	Documentation Docker Hub
v25.4	6.3.0	0.4.31	Documentation Docker Hub

Notas de rodapé

TP stands for Tensor Parallelism.
Throughput is measured in tokens/second

Data Center

Sistemas de negócios

Pessoais e para gamers

Embedded

Recursos

Aceleradores de GPU

Aceleradores adaptativos

Aceleradores DPU

Adaptadores de ethernet

Estações de trabalho

Desktops

Notebooks

Recursos

FPGAs e SoCs adaptativos

Sistemas em módulos (SOMs)

Tecnologias

Recursos do desenvolvedor

Kits e Placas de avaliação

Ferramentas para processador

Ferramentas para placas de vídeo e Apps

FPGA e Ferramentas SoC adaptativas

Propriedade intelectual e Apps

Ferramentas de acelerador de GPU e Apps

Ferramentas do adaptador Ethernet

Visão Geral

Comunicados à imprensa

Para borda e endpoints

Para desenvolvedores

Setores

Setores

Setores

Setores

Industrias

Cargas de trabalho

Jogos

Sistemas

Tecnologias

Recursos

Processadores EPYC

Placas de vídeo Radeon e Chipsets AMD

FPGA e SoCs adaptativos

Aceleradores Alveo e SOMs Kria

Processadores Ryzen

Adaptadores de ethernet

Visão Geral

Processadores

Aceleradores

Produtos incorporados

Placas de vídeo

Visão Geral

Recursos por produto

Recursos por tipo

Sobre os nossos parceiros

Suporte global da AMD

Processadores e Placas de vídeo

Aceleradores

FPGA e SoCs adaptativos

Jogos e computação pessoal

Computação incorporada e adaptativa

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

This page summarizes performance measurements on AMD Instinct™ GPUs running popular AI models.

AI Inference

vLLM

Results on AMD Instinct™ MI300X Platform

Results on AMD Instinct™ MI300X Platform

Previous Versions

Previous Versions

xDiT

Results on the AMD Instinct MI355X platform

Results on AMD Instinct™ MI355X Platform

Results on the AMD Instinct™ MI300X platform

Results on the AMD Instinct™ MI300X platform

Previous versions

Previous versions

AI Training