Fast AI inference with AMD EPYC™ 9004 Processors

Featured Webinar

Learn how AMD EPYC™ 9004 Series processors empower you to deploy CPU-based AI processing that is efficient for small AI models, many classical ML and inference workloads, plus traditional compute workloads being augmented by AI.

Leadership portfolio to address Enterprise AI Inference workloads

AI Inference uses a trained AI model to make predictions on new data. AMD offers a range of solutions for AI inference depending on your model size and application requirements. AMD EPYC™ CPUs excel for small to medium AI models and workloads where proximity to data matters. AMD Instinct™ GPUs shine for large models and dedicated AI deployments demanding very high performance and scale. Both offer impressive performance and efficiency, letting you choose the right solution for your needs.

Small
Medium
Large

Model Size Small

Model Size	Type	Pros	Cons	Typical Use Cases
Small	Classical	Faster inference Lower resource requirements	Lower accuracy Less complex tasks	Image recognition (basic) Sentiment analysis Spam detection
Small	Generative	Creative content generation (e.g., music, art) Personalization	Limited control over output Potential for bias	Text generation (short) Chatbots

Read the White Paper

Model Size Medium

Model Size	Type	Pros	Cons	Typical Use Cases
Medium	Classical	Balance between speed and accuracy Suitable for moderate complexity tasks	May require more training data Less efficient for very large datasets	Object detection in videos Machine translation Customer service chatbots
Medium	Predictive	Accurate predictions for various tasks Scalable for larger datasets	Can be computationally expensive Requires careful data preparation	Fraud detection Risk assessment Sales forecasting

Read the White Paper

Model Size Large

Model Size	Type	Pros	Cons	Typical Use Cases
Large	Generative	Highly realistic and complex content generation Advanced language understanding	Very resource-intensive High risk of bias and ethical concerns	Text generation (complex) Image and video generation Creative content design
	Classical	High accuracy for complex tasks Can handle large and diverse datasets	Extremely resource-intensive Difficult to interpret and explain	Medical diagnosis Self-driving cars Facial recognition
	Predictive	Highly accurate predictions with large datasets Handles complex relationships and patterns	Expensive to train and run Requires extensive data and expertise	Personalized recommendations Financial market analysis Scientific discovery

Read the White Paper

Applications and Industries

AI models integrated within computer vision, natural language processing, and recommendation systems have significantly impacted businesses across multiple industries. These models help companies recognize objects, classify anomalies, understand written and spoken words, and make recommendations. By accelerating the development of these models, businesses can reap the benefits, regardless of their industry.

Automotive

Computer vision models help propel self-driving cars and also help recognize signage, pedestrians, and other vehicles to be avoided. Natural-language processing models can help recognize spoken commands to in-car telematics.

Financial Services

AI-powered anomaly detection helps stop credit-card fraud, while computer vision models watch for suspicious documents including customer checks.

Retail

Automate checkout lines by recognizing products, or even create autonomous shopping experiences where the models link customers with the items they choose and put into their bags. Use product recommendation engines to offer alternatives, whether online or in the store.

Read the Dell AMD Infographic

Manufacturing

Use computer vision models to monitor quality of manufactured products from food items to printed circuit boards. Feed telemetry data into recommendation engines to suggest proactive maintenance: Are disk drives about to fail? Is the engine using too much oil?

Watch the Webinar

Medical

Detect anomalies including fractures and tumors with computer vision models. Use the same models in research to assess in vitro cell growth and proliferation.

Read the Dell AMD Infographic

Service Automation

Where IT meets customers, natural language processing can help take action based on spoken requests, and recommendation engines can help point customers to satisfactory solutions and product alternatives.

The Ideal choice for Enterprise AI Inference Workloads

Whether deployed as CPU only or used as a host for GPUs executing larger models, AMD EPYC™ 9004 Series processors are designed with the latest open standard technologies to accelerate Enterprise AI inference workloads.

Hardware
Software
Security

Architected for AI Inference

Up to 128 AMD “Zen 4” Cores with AVX-512 instruction support deliver great parallelism for AI inference workloads, reducing the need for GPU acceleration.

Exceptional Power Efficiency: AMD EPYC processors power the most energy efficient servers, delivering exceptional performance and helping reduce energy costs.¹

Fast Processing and I/O: 14% generational increase in instructions per clock cycle, DDR5 memory, and PCIe® Gen 5 I/O for fast data processing.²

Read the infographic

Learn 5 Reasons Why AMD EPYC for AI Inference

AMD Software Optimizations for AI Inference

Framework Support: AMD supports the most popular AI frameworks, including TensorFlow, PyTorch, and ONNX Runtime, covering diverse use cases like image classification and recommendation engines.

Open Source and Compatibility: Optimizations are integrated into popular frameworks offering broad compatibility and open-source upstream friendliness. Plus, AMD is working with Hugging Face to enable their open-source models out of the box with ZenDNN.

ZenDNN Plug-in: These plug-ins accelerate AI inference workloads by optimizing operators, leveraging microkernels, and implementing efficient multithreading on AMD EPYC cores.

Read the Blog

Image Zoom

Data security is even more important in the era of AI

As the use of digitization, cloud computing, AI, and other emerging technologies fuel the growth of data, the need for advanced security measures becomes even more pressing. This heightened need for security is further amplified by the increased global emphasis on privacy regulations and severe penalties for breaches, highlighting the unparalleled value of data amid rising security risks.

Built-in at the silicon level, AMD Infinity Guard offers the advanced capabilities required to defend against internal and external threats and help keep your data safe.³

Learn more

AMD EPYC™ 9004 processor-based servers and cloud instances enable fast, efficient AI-enabled solutions close to your customers and data.

Small / Medium AI Workload Models

2P Servers running Llama2-7B CHAT-HF and 13B CHAT-HF LLM⁴ (Relative Tokens/Second)

AMD EPYC™ 9654

1.36x

Xeon® Platinum® 8480+

1.0x

2P Servers running Phi-3 Mini (4K)⁵ (Relative Tokens/Second)

AMD EPYC™ 9654

1.24x

Xeon® Platinum® 8592+

1.0x

AWS instance running DLRMv2 at Int8 Precision⁶

4th Gen EPYC (HPC7a.96xl)

~1.44x

4th Gen Xeon (M7i.48xl)

1.0x

AWS instance running MiniLM with PyTorch and Neural Magic Deepsparse engine at FP32 precision⁷

4th Gen EPYC (m7a.48xl)

~1.78x

4th Gen Xeon (M7i.48xl)

1.0x

AWS instance running Llama2-7B at BF16 precision⁸

4th Gen EPYC (m7a.8xl)

~1.19x

4th Gen Xeon (M7i.8xl)

1.0x

Classical ML Workload Models

2P Servers running Clustering FAISS⁹ (Clustering /Second)

AMD EPYC™ 9654

Up to ~2.0x

Xeon® Platinum® 8592+

1.0x

2P Servers running XGBoost with Higgs boson particle explosion¹⁰ (Throughput)

AMD EPYC™ 9654

Up to ~1.7x

Xeon® Platinum® 8592+

1.0x

2P Servers running classification on random decision forests (SciKit-Learning RandomForest airline_ohe throughput)¹¹

AMD EPYC™ 9654

Up to ~1.36x

Xeon® Platinum® 8592+

1.0x

2P Servers running OpenVINO™ Road¹² Segmentation Inference (Frames/Sec per CPU W)

AMD EPYC™ 9754

Up to ~2.4x

Xeon® Platinum® 8592+

1.0x

2P Servers running TPCx-AI @ SF30¹³ (throughput/min)

AMD EPYC™ 9654

Up to ~1.65x

Xeon® Platinum® 8592+

1.0x

Resources

AMD EPYC Enterprise AI Briefs

Find AMD and partner documentation describing AI and Machine Learning Innovation

Visit the Library

AMD ZenDNN Library

Open-Source Enhanced Deep Learning Performance on AMD EPYC processors.

Learn More

Podcasts

Listen to leading technologists from AMD and the industry discussing the latest trending topics regarding servers, cloud computing, AI, HPC and more.

Visit the Archive

Footnotes

EPYC-028D: SPECpower_ssj® 2008, SPECrate®2017_int_energy_base, and SPECrate®2017_fp_energy_base based on results published on SPEC’s website as of 2/21/24. VMmark® server power-performance / server and storage power-performance (PPKW) based results published at https://www.vmware.com/products/vmmark/results3x.1.html?sort=score. The first 105 ranked SPECpower_ssj®2008 publications with the highest overall efficiency overall ssj_ops/W results were all powered by AMD EPYC processors. For SPECrate®2017 Integer (Energy Base), AMD EPYC CPUs power the first 8 top SPECrate®2017_int_energy_base performance/system W scores. For SPECrate®2017 Floating Point (Energy Base), AMD EPYC CPUs power the first 12 SPECrate®2017_fp_energy_base performance/system W scores. For VMmark® server power-performance (PPKW), have the top 5 results for 2- and 4-socket matched pair results outperforming all other socket results and for VMmark® server and storage power-performance (PPKW), have the top overall score. See https://www.amd.com/en/claims/epyc4#faq-EPYC-028D for the full list. For additional information on AMD sustainability goals see: https://www.amd.com/en/corporate/corporate-responsibility/data-center-sustainability.html. More information about SPEC® is available at http://www.spec.org. SPEC, SPECrate, and SPECpower are registered trademarks of the Standard Performance Evaluation Corporation. VMmark is a registered trademark of VMware in the US or other countries.
EPYC-038: Based on AMD internal testing as of 09/19/2022, geomean performance improvement at the same fixed-frequency on a 4th Gen AMD EPYC™ 9554 CPU compared to a 3rd Gen AMD EPYC™ 7763 CPU using a select set of workloads (33) including est. SPECrate®2017_int_base, est. SPECrate®2017_fp_base, and representative server workloads. SPEC® and SPECrate® are registered trademarks of Standard Performance Evaluation Corporation. Learn more at spec.org.
GD-183A AMD Infinity Guard features vary by EPYC™ Processor generations and/or series. Infinity Guard security features must be enabled by server OEMs and/or Cloud Service Providers to operate. Check with your OEM or provider to confirm support of these features. Learn more about Infinity Guard at https://www.amd.com/en/technologies/infinity-guard
SP5-222: Llama2 tokens/sec workload claim based on AMD internal testing as of 12/1/2023. 2P server configurations: 2P EPYC 9654 (96C/192T), BIOS AMI RTI1001C (NPS=1, Power Determinism, SMT=OFF), Memory: 1.5TB (24x 64GB DDR5-4800), Storage: NVMe 3.2T x 5 + NVMe 1T, OS: Ubuntu 22.04.2 LTS (Linux 5.15.0-84-generic), Software: Python 3.9.18, conda 4.12.0, huggingface-hub 0.17.3, intel-openmp 2023.2.0, mkl 2023.2.0, numpy 1.26.1, sentencepiece 0.1.99, tokenizers 0.14.1 torch 2.1.0+cpu, tpp-pytorch-extension 0.0.1, transformers 4.35.0, running 24 instances scoring up to 27.24 avg. token/sec (Llama2-13B-CHAT-HF, input token size: 8, bfloat16), and up to 52.89 avg. token/sec (Llama2-7B-CHAT-HF, input size: 8, bfloat16), is 1.36x the performance of 2P Xeon Platinum 8480+ (56C/112T), BIOS ESE110Q-1.10 (Profile=Maximum Performance, HT=OFF), 1TB (16x 64GB DDR5-4800), Storage: NVMe 3.2T x 4, OS: Ubuntu 22.04.3 LTS (Linux 5.15.0-88-generic), Software: Python 3.9.18, conda 4.12.0, huggingface-hub 0.17.3, intel-openmp 2023.2.0, mkl 2023.2.0, numpy 1.26.1, sentencepiece 0.1.99, tokenizers 0.14.1 torch 2.1.0+cpu, tpp-pytorch-extension 0.0.1, transformers 4.35.0, running 14 instances scoring up to 20.08 avg. token/sec (Llama2-13B-CHAT-HF, input token size: 8, bfloat16), and up to 38.98 avg. token/sec (Llama2-7B-CHAT-HF, input token size: 8, bfloat16). Results may vary due to factors including system configurations, software versions and BIOS settings.
TBD "SP5-289: Phi-3-mini   throughput results based on AMD internal testing as of 6/10/2024.
Phi-3-mini configurations: single instance, IPEX 2.3.0, BF16, batch size 1, Input token 16, output token 32.
Server configurations:
2P EPYC 9654 (96C/192T), Lenovo ThinkSystem SR665 V3, (SMT=off, NPS=1, Power Determinism, BIOS 1.56), 1.5TB (24x 64GB DDR5-5600 running at 4800 MT/s), 3.2 TB SSD, Ubuntu® 22.04.4 LTS .
2P Xeon    Platinum 8592+ (64C/128T), Lenovo ThinkSystem SR650 V3 (HT=off, NPS-1, Turbo Enabled, Profile=Maximum Performance, BIOS ESE122V-3.10), 1TB (16x 64GB DDR5-4800), 3.2TB NVMe, Ubuntu 22.04.4 LTS, AMX on.
Results, Phi-3-mini 4K:
Median Score     Relative to EMR
Intel 8592+         12.63    1.00
AMD EPYC 9654                15.68    1.241
Results, Phi-3-mini 128K
Median Score     Relative to EMR
Intel 8592+         13.92    1
AMD EPYC 9654                15.21    1.093
SP5C-065: AWS HPC7a.96xlarge average scores and Cloud OpEx savings comparison to M7i. 48xl running Deep Learning Recommendation Model (dlrm-v2.99) with batch size = 2000 at Int8 precision with OneDNN library with IPEX extension using on-demand pricing US-East (Ohio) Linux® as of 6/11/2024 of M7i. 48xl: $9.6768 / hr. HPC7a.96xlarge: $7.20 / hr. AWS pricing: https://aws.amazon.com/ec2/pricing/on-demand/
Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
SP5C-070: AWS m7a.48xl average score and Cloud OpEx savings comparison to m7i.48xl running HuggingFace's all-MiniLM-L6-v2model on PyTorch and Neural Magic Deepsparse engine with 24 parallel runs of and batch size = 1, input token size = 512, output token size = 128 at FP32 precision, using on-demand pricing US-East (Ohio) Linux® as of 7/15/2024 of m7i.48xl: $9.6768 / hr. m7a.48xl: $11.12832/ hr.
AWS Pricing: https://aws.amazon.com/ec2/pricing/on-demand/
Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system
SP5C-071: AWS M7a.8xl average score and Cloud OpEx savings comparison to m7i.8xl running Llama2 model with 7B parameters at BF16 on a single instance on Intel TPP library with batch size = 4, input token size = 2016, output token size = 256.
Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
SP5C-060: AWS m7a.4xl average score and Cloud OpEx savings comparison to M7i.4xl running BERT-Large- pruned80_quant-none.vnni model at FP32 with batch size = 1, 128, 256, input token size = 512, output token size = 512 using on-demand pricing US-East (Ohio) Linux® as of 6/11/2024 of M7i.4xl: $0.8064 / hr. M7a.4xl: $0.92736/ hr. Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
SP5-185A: FAISS v1.7.4 1000 throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/96T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), Samsung MZQL21T9HCJR-00A07 1.92 TB, Ubuntu® 22.04.3 LTS running 8 instances/24 cores/instance scoring 39.6 median throughput is 2.04x the performance of 2P Xeon Platinum 8592+ (64C/64T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 19.4 median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
SP5-251: XGBoost 2.0.3 throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/192T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), Samsung MZQL21T9HCJR-00A07 1.92 TB, Ubuntu 22.04.3 LTS scoring 203 Airline median throughput (running 16 instances/12 cores/instance) and 2057 Higgs median throughput (running 32 instances/6 cores/instance) for 1.38x and 1.71x the performance, respectively, of 2P Xeon Platinum 8592+ (64C/128T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 147 Airline median throughput and 4 instances/32 cores/instance scoring 1200 Higgs median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
SP5-184A: SciKit-Learning Random Forest v2023.2 airline_ohe data set throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/96T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), 2x Samsung MZQL21T9HCJR-00A07 1.7 TB, Ubuntu® 22.04.3 LTS running 12 instances/16 cores/instance scoring 166.8 median throughput is 1.36x the performance of 2P Xeon Platinum 8592+ (64C/64T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 123.1 median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
SP5-252: Third-party testing OpenVINO 2023.2.dev FPS comparison based on Phoronix review https://www.phoronix.com/review/intel-xeon-platinum-8592/9 as of 12/14/2023 of select OpenVINO tests: Vehicle Detection FP16, Person Detection FP16, Person Vehicle Bike Detection FP16, Road Segmentation ADAS FP16 and Face Detection Retail FP16. Road Segmentation ADAS FP16 was max uplift of 2.36x. Testing not independently verified by AMD. Scores will vary based on system configuration and determinism mode used (Power Determinism used). OpenVINO is a trademark of Intel Corporation or its subsidiaries.
SP5-051A: TPCx-AI SF30 derivative workload comparison based on AMD internal testing running multiple VM instances as of 4/13/2024. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end AI throughput test results do not comply with the TPCx-AI Specification. AMD system configuration: Processors: 2 x AMD EPYC 9654; Frequencies: 2.4 GHz | 3.7 GHz; Cores: 96 cores per socket (1 NUMA domain per socket); L3 Cache: 384MB/socket (768MB total); Memory: 1.5TB (24) Dual-Rank DDR5-5600 64GB DIMMs, 1DPC (Platform supports up to 4800MHz); NIC: 2 x 100 GbE Mellanox CX-5 (MT28800); Storage: 3.2 TB Samsung MO003200KYDNC U.3 NVMe; BIOS: 1.56; BIOS Settings: SMT=ON, Determinism=Power, NPS=1, PPL=400W, Turbo Boost=Enabled; OS: Ubuntu® 22.04.3 LTS; Test config: 6 instances, 64 vCPUs/instance, 2663 aggregate AI use cases/min vs. Intel system configuration: Processors: 2 x Intel® Xeon® Platinum 8592+; Frequencies: 1.9 GHz | 3.9 GHz; Cores: 64 cores per socket (1 NUMA domain per socket); L3 Cache: 320MB/socket (640MB total); Memory: 1TB (16) Dual-Rank DDR5-5600 64GB DIMMs, 1DPC; NIC: 4 x 1GbE Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe; Storage: 3.84TB KIOXIA KCMYXRUG3T84 NVMe; BIOS: ESE124B-3.11; BIOS Settings: Hyperthreading=Enabled, Turbo boost=Enabled, SNC=Disabled; OS: Ubuntu® 22.04.3 LTS; Test config: 4 instances, 64 vCPUs/instance, 1607 aggregate AI use cases/min. Results may vary due to factors including system configurations, software versions and BIOS settings. TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Adaptive SoCs, FPGAs, & SOMs

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Accelerate your Enterprise AI Inference Deployments with AMD EPYC™ Processors

Featured Webinar

Leadership portfolio to address Enterprise AI Inference workloads

Model Size Small

Model Size Medium

Model Size Large

Applications and Industries

Automotive

Financial Services

Retail

Manufacturing

Medical

Service Automation

The Ideal choice for Enterprise AI Inference Workloads

Architected for AI Inference

AMD Software Optimizations for AI Inference

Data security is even more important in the era of AI

AMD EPYC™ 9004 processor-based servers and cloud instances enable fast, efficient AI-enabled solutions close to your customers and data.

Small / Medium AI Workload Models

Classical ML Workload Models