EPYC 9005 for AI Inferencing

Overview

Deploy small and mid-size models on AMD EPYC™ 9005 server CPUs—on prem or in the cloud—and help maximize value from your computing investments.

Cost-Effective Inference for Enterprise AI

As the industry shifts from training models to running them, CPUs can pull double duty: run AI and general-purpose workloads side by side.

Read the Blog

Up to 10X Better Performance as a Host CPU¹

In GPU-based systems, the host CPU can affect the overall AI system performance. When used as a host CPU, high-frequency AMD EPYC 9575F CPUs significantly improve latency-constrained inference serving.

Read the Blog

See How We Improved Llama Performance up to 16X²

Speculative decoding predicts multiple future tokens and verifies them in parallel. In this case, AMD engineers enhanced this process to improve performance for large language models (LLMs) on 5th Gen AMD EPYC™ server CPUs.

Read the Technical Article

Which Hardware Is Best for Different Inference Workloads?

To avoid overprovisioning and get the best return on your AI investments, it’s important to match your model size and latency requirements to the right hardware. The latest generations of AMD EPYC server CPUs can handle a range of AI tasks alongside general-purpose workloads. As model sizes grow, volumes go up, and lower latencies become critical, GPUs become more efficient and cost effective.

Start with CPUs for Cost-Effective Inference

The latest AMD EPYC server CPUs can run small to medium AI inference workloads with sub-second latency, making them a good fit for small and mid-size model sizes. Use CPUs for batch or offline processing where latency is not critical, for mid latency (seconds to minutes), and low latency (500 ms to seconds) response times.

5th Generation AMD EPYC™ CPUs

Add GPUs for Larger Models and Faster Responses

As model sizes grow or response times shrink, you may need to add a purpose-built data center AI GPU. High frequency AMD EPYC CPUs combined with AMD Instinct™ GPUs are a great fit for model sizes from ~20 billion to ~450 billion parameters. Together, they can deliver low latency and near-real time (100 ms to 500 ms) responses.

AMD EPYC Server CPUs as a Host for GPUs

Use GPU Clusters for Large-Scale Deployments

For large models, real-time workloads, and complex, multi-agent pipelines, GPU clusters can deliver high performance per dollar. AMD Instinct platforms use multiple GPUs and are optimal for models with over approximately 450 billion parameters. These GPU clusters can deliver near-real time and real time responses.

AMD Instinct GPUs

AI Inference Workload	Good fit for...
AI Inference Workload	CPUs	CPUs + PCIe-Based GPU	GPU Clusters
Document processing and classification	✓
Data mining and analytics	✓		✓
Scientific simulations	✓
Translation	✓
Indexing	✓
Content moderation	✓
Predictive maintenance	✓		✓
Virtual assistants	✓	✓
Chatbots	✓	✓
Expert agents	✓	✓
Video captioning	✓	✓
Fraud detection		✓	✓
Decision-making		✓	✓
Dynamic pricing		✓	✓
Audio and video filtering		✓	✓
Financial trading			✓
Telecommunications and networking			✓
Autonomous systems			✓

The AI continuum: what infrastructure works best for inference? infographic cover

Find the Best Inference Hardware

Depending on your workload requirements, either high-core count CPUs alone or a combination of CPUs and GPUs work best for inference. Learn more about which infrastructure fits your model size and latency needs.

See Infographic

5 AI Inference Workloads that Run on a CPU

The latest AMD EPYC server CPUs can meet the performance requirements of a range of AI workloads, including classic machine learning, computer vision, and AI agents. Read about five popular workloads that run great on CPUs.

Read Listicle

5 AI Inference Workloads that Run on a CPU listicle cover

curved transparent to black top gradient divider

Fast, Efficient Inference with AMD EPYC Server CPUs

Whether deployed in a CPU-only server or used as a host for GPUs executing larger models, AMD EPYC server CPUs are designed with the latest open standard technologies to accelerate enterprise AI inference workloads.

5th Gen AMD EPYC Server CPUs Outperform Intel Xeon 6 in Inference, End-to-End AI, and Machine Learning

Claims compare 5th Gen AMD EPYC 9965 server CPUs versus Intel Xeon 6980P.

Up To

89%

Better Chatbot Performance on DeepSeek³

Up To

33%

Better Inference Performance for Translation Use Case with Llama 3.1 8B⁴

Up To

36%

Better Inference Performance for Translation Use Case on Llama 3.2 1B⁵

Small Language Models
Medium Language Models
Large Language Models
End-to End AI Performance
Classic Machine Learning

Translation on Llama 3.2 1B⁵

~1.36x

Essay on Llama 3.2 1B⁵

~1.27x

5th Gen AMD EPYC 9965

Intel Xeon 6980P

Translation on Llama 3.1 8B⁴

~1.33x

Summarization on GPT-J 6B⁶

~1.28x

5th Gen AMD EPYC 9965

Intel Xeon 6980P

Chatbot on DeepSeek-R1 671B³

~1.89x

Essay on DeepSeek-R1 671B³

~1.71x

Summary on DeepSeek-R1 671B³

~1.41x

Rewrite on DeepSeek-R1 671B³

~1.20x

5th Gen AMD EPYC 9965

Intel Xeon 6980P

5th Gen AMD EPYC 9965

Intel Xeon 6980P

TPCx-AI@SF30 derivative¹⁰

~1.70x

XGBoost (Higgs)¹¹

~1.93x

Facebook AI Similarity Seach (FAISS)¹²

~1.60x

5th Gen AMD EPYC 9965

Intel Xeon 6980P

Frequently Asked Questions

First, determine your performance needs. How fast do you need responses in terms of minutes, seconds, or milliseconds? How big are the models you’re running in terms of parameters? You may be able to meet performance requirements simply by upgrading to a 5th Gen AMD EPYC CPU, avoiding the cost of GPU hardware.

If you don’t need responses in real time, batch inference is cost-efficient for large-scale and long-term analysis—for example, analyzing campaign performance or predictive maintenance. Real-time inference that supports interactive use cases like financial trading and autonomous systems may need GPU accelerators. While CPUs alone are excellent for batch inference, GPUs are best for real-time inference.

CPUs alone offer enough performance for inference on models up to ~20 billion parameters and for mid-latency response times (seconds to minutes). This is sufficient for many AI assistants, chatbots, and agents. Consider adding GPU accelerators when models are larger or response times must be faster than this.

The short answer is it depends. Extracting maximum performance for a workload is very workload and expertise dependent. With that said, select 5th Gen AMD EPYC Server CPUs outperform comparable Intel Xeon 6 in inference for many popular AI workloads, including large language models (DeepSeek-R1 671B),³ medium language models (Llama 3.1 8B⁴and GPT-J 6B⁶), and small language models (Llama 3.2 1B).⁵

AMD EPYC server CPUs include AMD Infinity Guard which provides a silicon-based set of security features.⁷ AMD Infinity Guard includes AMD Secure Encrypted Virtualization (AMD SEV), a widely adopted confidential computing solution that uses confidential virtual machines (VMs) to help protect data, AI models, and workloads at runtime.

AMD Powers the Full Spectrum of AI

Match your infrastructure needs to your AI ambitions. AMD offers the broadest AI portfolio, open standards-based platforms, and a powerful ecosystem—all backed by performance leadership.

AMD EPYC™ Server CPUs

As the leading CPU for AI,¹³ AMD EPYC server CPUs deliver exceptional performance as inference processors and hosts for GPU platforms.

Explore AMD EPYC Server CPUs

AMD Instinct™ GPUs

Available in a PCIe form factor or integrated cluster, AMD Instinct™ GPUs bring exceptional efficiency and performance to generative AI, ideal for training massive models and high-speed inference.

Explore AMD Instinct GPUs

AMD Pensando™ Networking

Specifically engineered for AI, AMD Pensando™ open networking solutions enable high-speed, interoperable ethernet that scales out to meet evolving demands.

Explore AMD Pensando Networking Solutions

AMD Versal™ Adaptive SoCs

This highly integrated compute platform for embedded applications includes real-time CPU cores, programable logic and network on chip (NoC), plus AI engines for machine learning, providing outstanding system-level performance in use cases that demand customized hardware.

Explore AMD Versal Adaptive SOCs

Data Security for AI Workloads

As AI fuels data growth, advanced security becomes even more critical. This need is further amplified by increasing emphasis on privacy regulations, data sovereignty, and severe penalties for breaches. Built-in at the silicon level, AMD Infinity Guard offers the security capabilities required for AI, including AMD Secure Encrypted Virtualization (SEV), the industry’s most mature confidential computing solution.⁷

Explore AMD Infinity Guard

AMD EPYC Deployment Options

Broad Ecosystem for AI On-Premises

Find enterprise AI hardware from our OEM partners, including servers with high core count and high frequency CPUs, a premier line of GPUs, and interoperable networking solutions.

See All Hardware Partners

Scale AI in the Cloud

Get the most from your cloud by choosing AMD technology-based virtual machines (VMs) for AI workloads.

See All Cloud Partners

Inference Frameworks for Open Software Development

With AMD ZenDNN and AMD ROCm™ software, developers can optimize their application performance while using their choice of frameworks.

Resources

AI Webinars

Watch on-demand webinars to learn more about the advantages of inference on AMD EPYC server CPUs.

See Webinars

AI Documentation

Find solution briefs, white papers, and more on deploying AI inference on AMD EPYC server CPUs.

See All Documentation

Technical Articles and Blogs

Get technical details and guidance on using AMD EPYC server CPU features, tools, and tuning for your inference workloads.

Visit Technical Articles and Blogs

AMD TechTalk Podcasts

Hear about the latest trends in AI from leading technology experts.

Listen Now

Subscribe to Data Center Insights from AMD

Subscribe Now

Request Contact from an AMD EPYC Sales Expert

Contact AMD

Footnotes

9xx5-169: Llama-3.3-70B latency constrained throughput (goodput ) results based on AMD internal testing as of 05/14/2025.Configurations: Llama-3.3-70B, vLLM API server v1.0, data set: Sonnet3.5-SlimOrcaDedupCleaned, TP8, 512 max requests (dynamic batching), latency constrained time to first token (300ms, 400ms, 500ms, 600ms), OpenMP 128, results in tokens/s. 2P AMD EPYC 9575F (128 Total Cores, 400W TDP, production system, 1.5TB 24x64GB DDR5-6400 running at 6000 MT/s, 2 x 25 GbE ConnectX-6 Lx MT2894, 4x 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe ; Micron_7450_MTFDKCC800TFS 800GB NVMe for OS, Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic , BIOS 3.2, SMT=OFF, Determinism=power, mitigations=off) with 8x NVIDIA H100. 2P Intel Xeon 8592+ (128 Total Cores, 350W TDP, production system, 1TB 16x64GB DDR5-5600 , 2 x 25 GbE ConnectX-6 Lx (MT2894), 4x 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe, Micron_7450_MTFDKBA480TFR 480GB NVMe , Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic , SMT=OFF, Performance Bias, Mitigations=off) with 8x NVIDIA H100. Results:CPU 300 400 500 600; 8592+ 0 126.43 1565.65 1987.19; 9575F 346.11 2326.21; 2531.38 2572.42; Relative NA 18.40 1.62 1.29. Results may vary due to factors including system configurations, software versions, and BIOS settings. TDP information from ark.intel.com
Parallel draft models (PARD) technology on Llama-3.2-1B-Instruct. See configurations: https://www.amd.com/en/developer/resources/technical-articles/2025/speculative-llm-inference-on-the-5th-gen-amd-epyc-processors-wit.html
9xx5-152A: Deepseek-R1-671B throughput results based on AMD internal testing as of 01/28/2025. Configurations: llama.cpp framework, 1.58 bit quantization (UD_IQ1_S, MoE at 1.56 bit), batch sizes 1 and 4, 16C Instances, Use Case Input/Output token configurations: [Chatbot = 128/128, Essay = 128/1024, Summary = 1024/128, Rewrite = 1024/1024]. 2P AMD EPYC 9965 (384 Total Cores, 500W TDP, reference system, 3TB 24x128GB DDR5-6400, 2 x 40 GbE Mellanox CX-7 (MT2910) 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe, Ubuntu® 22.04.3 LTS | 5.15.0-105-generic), SMT=ON, Determinism=power, Mitigations=on) 2P AMD EPYC 9755 (256 Total Cores, 500W TDP, reference system, 3TB 24x128GB DDR5-6400, 2 x 40 GbE Mellanox CX-7 (MT2910) 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe, Ubuntu® 22.04.3 LTS | 5.15.0-105-generic), SMT=ON, Determinism=power, Mitigations=on) 2P Intel Xeon 6980P (256 Total Cores, 500W TDP, production system, 3TB 24x64GB DDR5-6400, 4 x 1GbE Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe 3.84TB SAMSUNG MZWLO3T8HCLS-00A07 NVMe, Ubuntu 24.04.2 LTS | 6.13.2-061302-generic, SMT=ON, Performance Bias, Mitigations=on) Results: BS=1 6980P 9755 9965 Rel9755 Rel9965 Chatbot 47.31 61.88 70.344 1.308 1.487 Essay 42.97 56.04 61.608 1.304 1.434 Summary 44.99 59.39 62.304 1.32 1.385 Rewrite 41.8 68.44 55.08 1.637 1.318 BS=4 6980P 9755 Rel9755 Rel9965 Chatbot 76.01 104.46 143.496 1.374 1.888 Essay 67.89 93.68 116.064 1.38 1.71 Summary 70.88 103.39 99.96 1.459 1.41 Rewrite 65 87.9 78.12 1.352 1.202 Results may vary due to factors including system configurations, software versions, and BIOS settings.
9xx5-156: Llama3.1-8B throughput results based on AMD internal testing as of 04/08/2025. Llama3.1-8B configurations: BF16, batch size 32, 32C Instances, Use Case Input/Output token configurations: [Summary = 1024/128, Chatbot = 128/128, Translate = 1024/1024, Essay = 128/1024]. 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400, 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.5 LTS, Linux 6.9.0-060900-generic, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1, ZenDNN 5.0.1 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400, 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.4 LTS, Linux 6.8.0-52-generic, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1, ZenDNN 5.0.1 2P Xeon 6980P (256 Total Cores), AMX On, 1.5TB 24x64GB DDR5-8800 MRDIMM, 1.0 Gbps Ethernet Controller X710 for 10GBASE-T, Micron_7450_MTFDKBG1T9TFR 2TB, Ubuntu 22.04.1 LTS Linux 6.8.0-52-generic, BIOS 1.0 (SMT=off, mitigations=on Performance Bias), IPEX 2.6.0 Results: CPU 6980P 9755 9965 Summary 1 n/a1.093 Translate 1 1.062 1.334 Essay 1 n/a 1.14 Results may vary due to factors including system configurations, software versions, and BIOS settings.
9xx5-166: Llama3.2-1B throughput results based on AMD internal testing as of 04/08/2025. Llama3.3-1B configurations: BF16, batch size 32, 32C Instances, Use Case Input/Output token configurations: [Summary = 1024/128, Chatbot = 128/128, Translate = 1024/1024, Essay = 128/1024]. 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400, 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.5 LTS, Linux 6.9.0-060900-generic, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1, ZenDNN 5.0.1, Python 3.10.2 2P Xeon 6980P (256 Total Cores), AMX On, 1.5TB 24x64GB DDR5-8800 MRDIMM, 1.0 Gbps Ethernet Controller X710 for 10GBASE-T, Micron_7450_MTFDKBG1T9TFR 2TB, Ubuntu 22.04.1 LTS Linux 6.8.0-52-generic, BIOS 1.0 (SMT=off, mitigations=on, Performance Bias), IPEX 2.6.0, Python 3.12.3 Results: CPU 6980P 9965 Summary 1 1.213 Translation 1 1.364 Essay 1 1.271 Results may vary due to factors including system configurations, software versions, and BIOS settings.
9xx5-158: GPT-J-6B throughput results based on AMD internal testing as of 04/08/2025. GPT-J-6B configurations: BF16, batch size 32, 32C Instances, Use Case Input/Output token configurations: [Summary = 1024/128, Chatbot = 128/128, Translate = 1024/1024, Essay = 128/1024]. 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400, 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.5 LTS, Linux 6.9.0-060900-generic, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1, ZenDNN 5.0.1, Python 3.10.12 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400, 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.4 LTS, Linux 6.8.0-52-generic, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1, ZenDNN 5.0.1, Python 3.10.12 2P Xeon 6980P (256 Total Cores), AMX On, 1.5TB 24x64GB DDR5-8800 MRDIMM, 1.0 Gbps Ethernet Controller X710 for 10GBASE-T, Micron_7450_MTFDKBG1T9TFR 2TB, Ubuntu 22.04.1 LTS Linux 6.8.0-52-generic, BIOS 1.0 (SMT=off, mitigations=on, Performance Bias), IPEX 2.6.0, Python 3.12.3 Results: CPU 6980P 9755 9965 Summary 1 1.034 1.279 Chatbot 1 0.975 1.163 Translate 1 1.021 0.93 Essay 1 0.978 1.108 Caption 1 0.913 1.12 Overall 1 0.983 1.114 Results may vary due to factors including system configurations, software versions, and BIOS settings.
GD-183A AMD Infinity Guard features vary by EPYC™ Processor generations and/or series. Infinity Guard security features must be enabled by server OEMs and/or Cloud Service Providers to operate. Check with your OEM or provider to confirm support of these features. Learn more about Infinity Guard at https://www.amd.com/en/products/processors/server/epyc/infinity-guard.html.
9xx5-002F: SPECrate®2017_int_base comparison based on published scores from www.spec.org as of 12/4/2025. Results and configurations below are in the format of: [processor], [cores], [TDP], [1Ku price in USD], [SPECrate®2017)_int_base score], [SPECrate® 2017)_int_base score / CPU W], [SPECrate® 2017)_int_base score / 1Ku price in USD], [Link to score]
2P AMD EPYC 9654, 96C, 360W, $8452 USD, 1830, 5.083, 0.217, https://www.spec.org/cpu2017/results/res2025q3/cpu2017-20250727-49206.html
2P AMD EPYC 9754, 128C, 360W, $10631 USD, 1950, 5.417, 0.183, https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230522-36617.html
2P AMD EPYC 9755, 128C, 500W, $10931 USD, 2840, 5.680, 0.260, https://www.spec.org/cpu2017/results/res2025q2/cpu2017-20250324-47223.html
2P AMD EPYC 9965, 192C, 500W, $11988 USD, 3230, 6.460, 0.269, https://www.spec.org/cpu2017/results/res2025q2/cpu2017-20250324-47086.html
2P Intel Xeon 6780E, 144C, 330W, $8513 USD, 1410, 4.273, 0.166, https://www.spec.org/cpu2017/results/res2024q3/cpu2017-20240811-44406.html
2P Intel Xeon 6980P, 128C, 500W, $12460 USD, 2510, 5.020, 0.201, https://www.spec.org/cpu2017/results/res2025q2/cpu2017-20250324-47099.html
2P Intel Xeon Platinum 8592+, 64C, 350W, $11600 USD, 1130, 3.229, 0.097, https://www.spec.org/cpu2017/results/res2023q4/cpu2017-20231127-40064.html
SPEC®, SPEC CPU®, and SPECrate® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information. AMD CPU prices as of 12/9/2025. Intel CPU W and prices at https://ark.intel.com/ as of 12/9/2025
9xx5-001: Based on AMD internal testing as of 9/10/2024, geomean performance improvement (IPC) at fixed-frequency. - 5th Gen EPYC generational ML/HPC Server Workloads IPC Uplift of 1.369x (geomean) using a select set of 24 workloads and is the geomean of representative ML Server Workloads (geomean), and representative HPC Server Workloads (geomean). “Genoa Config (all NPS1) “Genoa” config: EPYC 9654 BIOS TQZ1005D 12c12t (1c1t/CCD in 12+1), FF 3GHz, 12x DDR5-4800 (2Rx4 64GB), 32Gbps xGMI; “Turin” config (all NPS1): EPYC 9V45 BIOS RVOT1000F 12c12t (1c1t/CCD in 12+1), FF 3GHz, 12x DDR5-6000 (2Rx4 64GB), 32Gbps xGMI Utilizing Performance Determinism and the Performance governor on Ubuntu 22.04 w/ 6.8.0-40-generic kernel OS for all workloads except LAMMPS, HPCG, NAMD, OpenFOAM, Gromacs which utilize 24.04 w/ 6.8.0-40-generic kernel. SPEC® and SPECrate® are registered trademarks for Standard Performance Evaluation Corporation. Learn more at spec.org.
9xx5-151: TPCxAI @SF30 Multi-Instance, 32C Instance Size throughput results based on AMD internal testing as of 04/01/2025 running multiple VM instances. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end AI throughput test results do not comply with the TPCx-AI Specification. 2P AMD EPYC 9965 (6067.53 Total AIUCpm, 384 Total Cores, 500W TDP, AMD reference system, 1.5TB 24x64GB DDR5-6400, 2 x 40 GbE Mellanox CX-7 (MT2910), 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe, Ubuntu® 24.04 LTS kernel 6.13, SMT=ON, Determinism=power, Mitigations=on) 2P AMD EPYC 9755 (4073.42 Total AIUCpm, 256 Total Cores, 500W TDP, AMD reference system, 1.5TB 24x64GB DDR5-6400, 2 x 40 GbE Mellanox CX-7 (MT2910) 3.84TB Samsung MZWLO3T8HCLS-00A07 NVMe, Ubuntu 24.04 LTS kernel 6.13, SMT=ON, Determinism=power, Mitigations=on) 2P Intel Xeon 6980P (3550.50 Total AIUCpm, 256 Total Cores, 500W TDP, Production system, 1.5TB 24x64GB DDR5-6400, 4 x 1GbE Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe 3.84TB SAMSUNG MZWLO3T8HCLS-00A07 NVMe, Ubuntu 24.04 LTS kernel 6.13, SMT=ON, Performance Bias, Mitigations=on) Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. TPC, TPC Benchmark, and TPC-H are trademarks of the Transaction Processing Performance Council.
9xx5-162: XGBoost (Runs/Hour) throughput results based on AMD internal testing as of 04/08/2025. XGBoost Configurations: v1.7.2, Higgs Data Set, 32 Core Instances, FP32 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.5 LTS, Linux 5.15 kernel, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.4 LTS, Linux 5.15 kernel, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1 2P Xeon 6980P (256 Total Cores), 1.5TB 24x64GB DDR5-8800 MRDIMM, 1.0 Gbps Ethernet Controller X710 for 10GBASE-T, Micron_7450_MTFDKBG1T9TFR 2TB, Ubuntu 22.04.1 LTS Linux 6.8.0-52-generic, BIOS 1.0 (SMT=off, mitigations=on, Performance Bias) Results: CPU Throughput Relative 2P 6980P 400 1 2P 9755 436 1.090 2P 9965 771 1.928 Results may vary due to factors including system configurations, software versions and BIOS settings.
9xx5-164: FAISS (Runs/Hour) throughput results based on AMD internal testing as of 04/08/2025. FAISS Configurations: v1.8.0, sift1m Data Set, 32 Core Instances, FP32 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.5 LTS, Linux 5.15 kernel, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NIC, 3.84 TB Samsung MZWLO3T8HCLS-00A07, Ubuntu® 22.04.4 LTS, Linux 5.15 kernel, BIOS RVOT1004A, (SMT=off, mitigations=on, Determinism=Power), NPS=1 2P Xeon 6980P (256 Total Cores), 1.5TB 24x64GB DDR5-8800 MRDIMM, 1.0 Gbps Ethernet Controller X710 for 10GBASE-T, Micron_7450_MTFDKBG1T9TFR 2TB, Ubuntu 22.04.1 LTS Linux 6.8.0-52-generic, BIOS 1.0 (SMT=off, mitigations=on, Performance Bias) Results: Throughput Relative 2P 6980P 36.63 1 2P 9755 46.86 1.279 2P 9965 58.6 1.600 Results may vary due to factors including system configurations, software versions and BIOS settings.
9xx5-012: TPCxAI @SF30 Multi-Instance 32C Instance Size throughput results based on AMD internal testing as of 09/05/2024 running multiple VM instances. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end AI throughput test results do not comply with the TPCx-AI Specification.
2P AMD EPYC 9965 (384 Total Cores), 12 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled)
2P AMD EPYC 9755 (256 Total Cores), 8 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled)
2P AMD EPYC 9654 (192 Total cores) 6 32C instances, NPS1, 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe, Ubuntu 22.04.3 LTS, BIOS 1006C (SMT=off, Determinism=Power)
Versus 2P Xeon Platinum 8592+ (128 Total Cores), 4 32C instances, AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe, , Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled)
Results:
CPU Median Relative Generational
Turin 192C, 12 Inst 6067.531 3.775 2.278
Turin 128C, 8 Inst 4091.85 2.546 1.536
Genoa 96C, 6 Inst 2663.14 1.657 1
EMR 64C, 4 Inst 1607.417 1 NA
Results may vary due to factors including system configurations, software versions and BIOS settings. TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Shift AI Inference Workloads to AMD EPYC™ Server CPUs

Overview

Which Hardware Is Best for Different Inference Workloads?

Find the Best Inference Hardware

5 AI Inference Workloads that Run on a CPU

Fast, Efficient Inference with AMD EPYC Server CPUs

5th Gen AMD EPYC Server CPUs Outperform Intel Xeon 6 in Inference, End-to-End AI, and Machine Learning

Frequently Asked Questions

How do we improve inference performance without runaway computing costs?

Which workloads are best suited to batch inference vs. real-time inference? How does that drive our infrastructure choices?

Are CPUs enough for our inference workloads? When do we need to add accelerators?

Does AMD EPYC or Intel Xeon have better inference performance?

How do we ensure our inference workloads are secure?

AMD Powers the Full Spectrum of AI

AMD EPYC Deployment Options

Broad Ecosystem for AI On-Premises

Scale AI in the Cloud

Inference Frameworks for Open Software Development

Resources