Performance Results with AMD ROCm™ Software

This page summarizes performance measurements on AMD Instinct™ GPUs running popular AI models.

The results found on this page highlight both Inference and Training benchmarks. The results are organized by the following:

AI Inference: vLLM, xDiT
AI Training: pyTorch, Megatron-LM, and JAX MaxText

The hardware platforms include Instinct MI355X/MI325X/MI300X GPUs, with benchmark insights provided for each framework where data is available.

The data in the following tables are a reference point to help users evaluate observed performance. It should not be considered as the peak performance that AMD GPUs and ROCm™ software can deliver.

AI Inference

vLLM
xDiT

vLLM

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210
Release date: December 11, 2025
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04, amdgpu driver 6.14.14

Throughput Measurements

The table below shows performance data where a local inference client is fed requests at an infinite rate and shows the throughput client-server scenario under maximum load.

Model	Precision	TP¹ Size	Input	Output	No. Prompts	Max. Seqs	Throughput²
Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	128	2048	3200	3200	13562.4
			128	4096	1500	1500	11800.9
			500	2000	2000	2000	11249.5
			2048	2048	1500	1500	7753.1
Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV)	FP8	8	128	2048	1500	1500	3822.8
			128	4096	1500	1500	3085.8
			500	2000	2000	2000	3059.9
			2048	2048	500	500	2192.3

_{¹TP stands for Tensor Parallelism.

²Throughput is measured in tokens/second}

Latency results

The table below shows latency measurement, which typically involves assessing the time from when the system receives an input to when the model produces a result.

Model	Precision	TP¹ Size	Batch Size	Input	Output	Latency²
Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	1	128	2048	16.015
			2	128	2048	18.683
			4	128	2048	19.245
			8	128	2048	20.468
			16	128	2048	22.137
			32	128	2048	25.571
			64	128	2048	32.987
			128	128	2048	46.426
			1	2048	2048	16.421
			2	2048	2048	19.035
			4	2048	2048	20.221
			8	2048	2048	21.483
			16	2048	2048	24.350
			32	2048	2048	29.776
			64	2048	2048	40.625
			128	2048	2048	63.671
Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV)	FP8	8	1	128	2048	48.618
			2	128	2048	50.980
			4	128	2048	52.760
			8	128	2048	55.864
			16	128	2048	58.795
			32	128	2048	69.482
			64	128	2048	89.384
			128	128	2048	122.601
			1	2048	2048	49.106
			2	2048	2048	51.664
			4	2048	2048	54.220
			8	2048	2048	58.904
			16	2048	2048	65.389
			32	2048	2048	83.387
			64	2048	2048	115.575
			128	2048	2048	177.779

_{¹TP stands for Tensor Parallelism.

²Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

Inference Performance with vLLM

Previous Versions

This table lists previous versions of the ROCm vLLM inference Docker image for inference performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Docker image tag	Components	Resources
rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210 (latest)	ROCm 7.0.0 vLLM 0.11.2 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm7.0.0_vllm_0.11.1_20251024	ROCm 7.0.0 vLLM 0.11.1 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm7.0.0_ vllm_ 0.10.2_ 20251006	ROCm 7.0.0 vLLM 0.10.2 PyTorch 2.9.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_ vllm_ 0.10.0_ 20250812	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702	ROCm 6.4.1 vLLM 0.9.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.4.1_vllm_0.9.0.1_20250605	ROCm 6.4.1 vLLM 0.9.0.1 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_vllm_0.8.5_20250521	ROCm 6.3.1 0.8.5 vLLM (0.8.6.dev) PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_vllm_0.8.5_20250513	ROCm 6.3.1 vLLM 0.8.5 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_instinct_vllm0.8.3_20250415	ROCm 6.3.1 vLLM 0.8.3 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325	ROCm 6.3.1 vLLM 0.7.3 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6	ROCm 6.3.1 vLLM 0.6.6 PyTorch 2.7.0	Documentation Docker Hub
rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4	ROCm 6.2.1 vLLM 0.6.4 PyTorch 2.5.0	Documentation Docker Hub
rocm/vllm:rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50	ROCm 6.2.0 vLLM 0.4.3 PyTorch 2.4.0	Documentation Docker Hub

xDiT

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/pytorch-xdit:v25.12
Release date: Dec 8, 2025
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W) GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.3 LTS Host GPU driver ROCm 7.10.0_preview.

Models	Precision	Batch Size	Configuration	Latency¹
Hunyuan Video	BF16	1	720p, 129 Frames, 50 steps	86.74
Wan2.1	BF16	1	720p, 80 Frames, 40 steps	71.60
Wan2.2	BF16	1	720p, 80 Frames, 40 steps	66.69
Flux.1	BF16	1	1024x1240, 25 steps	0.94

_{¹ Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

xDiT Diffusion Inference on AMD GPUs User Guide

Results on the AMD Instinct™ MI300X platform

The following results are based on:

Docker container: rocm/pytorch-xdit:v25.12
Release date: Dec 8, 2025
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.10.0_preview.

Models	Precision	Batch Size	Configuration	Latency¹
Hunyuan Video	BF16	1	720p, 129 Frames, 50 steps	181.05
Wan2.1	BF16	1	720p, 80 Frames, 40 steps	151.25
Wan2.2	BF16	1	720p, 80 Frames, 40 steps	142.17
Flux.1	BF16	1	1024x1240, 25 steps	1.33

_{¹ Latency is measured in seconds}

Reproduce these results on your system by following these instructions:

xDiT Diffusion Inference on AMD GPUs User Guide

Previous versions

This table lists previous versions of the Megatron-LM training Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Docker image tag	Components	Resources
rocm/pytorch-xdit:v25.12(latest)	ROCm 7.10.0 preview TheRock 3e3f834	Documentation Docker Hub
rocm/pytorch-xdit:v25.11(latest)	ROCm 7.10.0 preview TheRock 3e3f834	Documentation Docker Hub
rocm/pytorch-xdit:v25.10	ROCm 7.9.0 preview TheRock 7afbe45	Documentation Docker Hub

AI Training

The table below shows training performance data, where the AMD Instinct™ platform measures text generation training throughput with a unique sequence length and batch size. It focuses on Tokens per second per GPU.

PyTorch
Megatron-LM
JaxMaxText

PyTorch

Results on the AMD Instinct MI355X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W)GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.0.1.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	8	8192	FALSE	1	1	1	30,330
Llama 3.1 8B	1	BF16	6	8192	FALSE	1	1	1	22,037
Llama 3.1 70B	1	FP8	6	8192	TRUE	1	1	1	3,774
Llama 3.1 70B	8	FP8	10	8192	TRUE	1	1	1	3,727
Llama 3.1 70B	1	BF16	8	8192	TRUE	1	1	1	2,325
Llama 3.1 405B	8	FP8	2	8192	TRUE	1	1	1	625

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.

Model	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	FP8	7	8192	FALSE	1	1	1	15,750
Llama 3.1 8B	BF16	6	8192	FALSE	1	1	1	11,648
Llama 3.1 70B	FP8	5	8192	TRUE	1	1	1	1,794
Llama 3.1 70B	BF16	6	8192	TRUE	1	1	1	1,196

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2-120.

Model	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	Tokens/sec/GPU
Llama 3.1 8B	FP8	5	8192	FALSE	1	1	1	12,666
Llama 3.1 8B	BF16	4	8192	FALSE	1	1	1	9,411
Llama 3.1 70B	FP8	3	8192	TRUE	1	1	1	1,386
Llama 3.1 70B	BF16	4	8192	TRUE	1	1	1	931

Reproduce these results on your system by following these instructions:

Training Performance with PyTorch on AMD GPUs User Guide

Previous Versions

This table lists previous versions of the PyTorch training Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Image version	ROCm version	PyTorch version	Resources
v26.1(latest)	ROCm 7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch training documentation PyTorch training (legacy) documentation Docker Hub
v25.11	ROCm 7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub
v25.10	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub
V25.9	7.0.0	Primus 0.3.0 PyTorch 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7	Primus PyTorch Training documentation PyTorch training (legacy) documentation Docker Hub (gfx950) Docker Hub (gfx942)
v25.8	6.4.3	2.8.0a0+gitd06a406	Primus PyTorch Training documentation PyTorch training (legacy) documentation
v25.7	6.4.2	2.8.0a0+gitd06a406	Documentation Docker Hub
v25.6	6.3.4	2.8.0a0+git7d205b2	Documentation Docker Hub
v25.5	6.3.4	2.7.0a0+git637433	Documentation Docker Hub
v25.4	6.3.0	2.7.0a0+git637433	Documentation Docker Hub

Megatron-LM

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 64-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W）GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.0.1.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	4	8192	FALSE	1	1	1	-	32,888
Llama 3.1 8B	1	BF16	4	8192	FALSE	1	1	1	-	22,374
Llama 3.1 70B	1	BF16	4	8192	TRUE	1	1	1	-	2,133
Llama 3.3 70B	1	BF16	6	8192	TRUE	1	1	1	-	2,031
Mixtral 8x7B	1	BF16	4	4096	FALSE	1	1	1	8	13,803
Mixtral 8x22B	8	BF16	2	8192	FALSE	1	1	4	8	3,534
DeepSeekV2 Lite	1	BF16	12	4096	FALSE	1	1	1	8	39,786

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.
For multi-mode run, Server: Dual AMD EPYC 9575F 64-Core processor-based production server with 8x AMD Instinct MI325 (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 1.5, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2.60402-120~22.04

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	2	8192	FALSE	1	1	1	-	16,224
Llama 3.1 8B	8	FP8	2	8192	FALSE	1	1	1	-	16,186
Llama 3.1 8B	1	BF16	4	8192	FALSE	1	1	1	-	11,842
Llama 3.1 70B	1	BF16	4	8192	TRUE	1	1	1	-	1,135
Llama 3.1 70B	8	FP8	4	8192	TRUE	1	1	1	-	1,726
Llama 3.1 70B	8	BF16	1	8192	TRUE	1	1	1	-	1,174
Llama 3.3 70B	1	BF16	5	8192	TRUE	1	1	1	-	1,095
Mixtral 8x7B	1	BF16	4	4096	FALSE	1	1	1	8	7,046
DeepSeekV2 Lite	1	BF16	10	4096	FALSE	1	1	1	8	21,277

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/primus:v26.1
Release date: Jan 21, 2026
Server: Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 6.4.2-120.
For multi-mode run, Server: Dual Intel Xeon Platinum 8480+ Processors with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 79007700 Ubuntu® 22.04, Host GPU driver ROCm 6.3.0-39.

Model	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/sec/GPU
Llama 3.1 8B	1	FP8	2	8192	FALSE	1	1	1	-	13,669
Llama 3.1 8B	1	BF16	2	8192	FALSE	1	1	1	-	9,563
Llama 3.1 70B	1	BF16	3	8192	TRUE	1	1	1	-	856
Llama 3.3 70B	1	BF16	2	8192	TRUE	1	1	1	-	843
Mixtral 8x7B	1	BF16	2	4096	FALSE	1	1	1	8	5,542
DeepSeekV2 Lite	1	BF16	4	4096	FALSE	1	1	1	8	17,403

Reproduce these results on your system by following these instructions:

Training Performance with Megatron-LM on AMD GPUs User Guide

Previous Versions

Image version	ROCm version	PyTorch version	Resources
v26.1(latest)	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.11	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.10	7.1.0	PyTorch 2.10.0.dev20251112+rocm7.1	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub
v25.9	7.0.0	Primus 0.3.0 PyTorch 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (gfx950) Docker Hub (gfx942)
v25.8	6.4.3	2.8.0a0+gitd06a406	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (py310)
v25.7	6.4.2	2.8.0a0+gitd06a406	Primus Megatron documentation Megatron-LM (legacy) documentation Docker Hub (py310)
v25.6	6.4.1	2.8.0a0+git7d205b2	Documentation Docker Hub (py312) Docker Hub (py310)
v25.5	6.3.4	2.8.0a0+gite2f9759	Documentation Docker Hub (py312) Docker Hub (py310)
v25.4	6.3.0	2.7.0a0+git637433	Documentation Docker Hub

JaxMaxText

Results on AMD Instinct™ MI355X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9575F 96-core processor-based production server with 8x AMD MI355X (288GB HBM3E 1400W）GPUs,1 NUMA node per socket, System BIOS 1.4a, Ubuntu® 22.04.5 LTS Host GPU driver ROCm 7.1.1.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	9	8192	TRUE	1	1	1	1	20,588
Llama 3.1 8B	1	FP8	9	8192	TRUE	1	1	1	1	27,138
Llama 3.1 8B	8	BF16	9	8192	TRUE	1	1	1	1	19,513
Llama 3.1 70B	1	BF16	10	8192	TRUE	1	1	1	1	2,281
Llama 3.1 70B	1	FP8	10	8192	TRUE	1	1	1	1	3,763
Llama 3.1 70B	8	BF16	10	8192	TRUE	1	1	1	1	2,198
Llama 3.1 405B	8	FP8	4	8192	TRUE	1	1	1	1	652
Llama 3.3 70B	1	BF16	10	8192	TRUE	1	1	1	1	2,281
Mixtral 8x7B	1	BF16	12	4096	FALSE	1	1	1	8	11,584

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Results on AMD Instinct™ MI325X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9655 96-core processor-based production server with 8x AMD MI325X (256GB HBM3E 1000W) GPUs, 1 NUMA node per socket, System BIOS 3B03, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.0.1.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	4	8192	TRUE	1	1	1	1	9,953
Llama 3.1 8B	1	FP8	4	8192	TRUE	1	1	1	1	12,661
Llama 3.1 70B	1	BF16	7	8192	TRUE	1	1	1	1	1,127
Llama 3.1 70B	1	FP8	7	8192	TRUE	1	1	1	1	1,712
Llama 3.3 70B	1	BF16	7	8192	TRUE	1	1	1	1	1,127
Mixtral 8x7B	1	BF16	9	4096	FALSE	1	1	1	8	6,083

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Results on AMD Instinct™ MI300X Platform

The following results are based on:

Docker container: rocm/jax-training:maxtext-v26.1
Release date: Jan 21, 2026
Server: Dual AMD EPYC 9554 64-core processor-based production server with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 1.8, Ubuntu® 22.04.5 LTS, Host GPU driver ROCm 7.1.1.
For multi-mode run, Server: Dual AMD EPYC 9654 Processors with 8x AMD MI300X (192GB HBM3 750W) GPUs, 1 NUMA node per socket, System BIOS 3.10 Ubuntu® 22.04, Host GPU driver ROCm 6.3.1-48.

Models	# nodes	Precision	Batch Size	Sequence Length	FSDP	TP	CP	PP	EP	Tokens/Sec/GPU
Llama 3.1 8B	1	BF16	4	8192	TRUE	1	1	1	1	8,720
Llama 3.1 8B	1	FP8	4	8192	TRUE	1	1	1	1	11,138
Llama 3.1 70B	1	FP8	5	8192	TRUE	1	1	1	1	1,472
Llama 3.1 70B	1	BF16	5	8192	TRUE	1	1	1	1	963
Llama 3.3 70B	1	BF16	5	8192	TRUE	1	1	1	1	962
Mixtral 8x7B	1	BF16	12	4096	FALSE	1	1	1	8	5,382

Reproduce these results on your system by following these instructions:

Training Performance with JaxMaxText on AMD GPUs User Guide

Previous Versions

The following results are based on:

This table lists previous versions of the ROCm JAX MaxText Docker image for training performance testing. For detailed information about available models for benchmarking, see the version-specific documentation.

Image version	ROCm version	JAX version	Resources
v26.1 (latest)	7.1.1	0.8.2	Documentation Docker Hub
v25.11	7.1.0	0.7.1	Documentation Docker Hub
v25.9	7.0.01	0.6.2	Documentation Docker Hub
v25.7	6.4.1	0.6.0, 0.5.0	Documentation Docker Hub (JAX 0.6.0) Docker Hub (JAX 0.5.0)
v25.5	6.3.4	0.4.35	Documentation Docker Hub
v25.4	6.3.0	0.4.31	Documentation Docker Hub

각주

TP stands for Tensor Parallelism.
Throughput is measured in tokens/second

데이터 센터

비즈니스 시스템

개인 및 게이밍

Embedded

리소스

GPU 가속기

적응형 가속기

DPU 가속기

이더넷 어댑터

워크스테이션

데스크탑

랩탑

리소스

FPGA 및 적응형 SoC

시스템 온 모듈(SOM)

기술

개발자 리소스

평가 보드 및 킷

프로세서 툴

그래픽 툴 및 앱

FPGA 및 적응형 SoC 툴

지적 재산 및 앱

GPU 가속기 툴 및 앱

이더넷 어댑터 도구

개관

데이터 센터 및 클라우드용

에지 및 엔드포인트용

개발자용

업계

업계

업계

업계

Industrias

워크로드

게이밍

시스템

기술

리소스

EPYC 프로세서

Radeon 그래픽 및 AMD 칩셋

FPGA 및 적응형 SoC

Alveo 가속기 및 Kria SOM

Ryzen 프로세서

이더넷 어댑터

개관

프로세서

가속기

임베디드 제품

그래픽

개관

제품별 리소스

유형별 리소스

파트너 정보

AMD 글로벌 지원

프로세서 및 그래픽

가속기

FPGA 및 적응형 SoC

게이밍 및 개인 컴퓨팅

적응형 및 임베디드 컴퓨팅

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

This page summarizes performance measurements on AMD Instinct™ GPUs running popular AI models.

AI Inference

vLLM

Results on AMD Instinct™ MI300X Platform

Results on AMD Instinct™ MI300X Platform

Previous Versions

Previous Versions

xDiT

Results on the AMD Instinct MI355X platform

Results on AMD Instinct™ MI355X Platform

Results on the AMD Instinct™ MI300X platform

Results on the AMD Instinct™ MI300X platform

Previous versions

Previous versions

AI Training