Featured Webinar

Learn how AMD EPYC™ 9004 Series processors empower you to deploy CPU-based AI processing that is efficient for small AI models, many classical ML and inference workloads, plus traditional compute workloads being augmented by AI.

Abstract connected dots and lines

Leadership portfolio to address Enterprise AI Inference workloads

AI Inference uses a trained AI model to make predictions on new data.  AMD offers a range of solutions for AI inference depending on your model size and application requirements.  AMD EPYC™ CPUs excel for small to medium AI models and workloads where proximity to data matters. AMD Instinct™ GPUs shine for large models and dedicated AI deployments demanding very high performance and scale. Both offer impressive performance and efficiency, letting you choose the right solution for your needs.

Model Size Small

Model Size   Type Pros Cons Typical Use Cases
Small   Classical
  • Faster inference
  • Lower resource requirements
  • Lower accuracy
  • Less complex tasks
  • Image recognition (basic)
  • Sentiment analysis
  • Spam detection
Generative
  • Creative content generation (e.g., music, art)
  • Personalization
  • Limited control over output
  • Potential for bias
  • Text generation (short)
  • Chatbots

Model Size Medium

Model Size   Type Pros Cons Typical Use Cases
Medium Classical
  • Balance between speed and accuracy
  • Suitable for moderate complexity tasks
  • May require more training data
  • Less efficient for very large datasets
  • Object detection in videos
  • Machine translation
  • Customer service chatbots
Predictive
  • Accurate predictions for various tasks
  • Scalable for larger datasets
  • Can be computationally expensive
  • Requires careful data preparation
  • Fraud detection
  • Risk assessment
  • Sales forecasting

Model Size Large

Model Size   Type Pros Cons Typical Use Cases
Large   Generative
  • Highly realistic and complex content generation
  • Advanced language understanding
  • Very resource-intensive
  • High risk of bias and ethical concerns
  • Text generation (complex)
  • Image and video generation
  • Creative content design
Classical
  • High accuracy for complex tasks
  • Can handle large and diverse datasets
  • Extremely resource-intensive
  • Difficult to interpret and explain
  • Medical diagnosis
  • Self-driving cars
  • Facial recognition
Predictive
  • Highly accurate predictions with large datasets
  • Handles complex relationships and patterns
  • Expensive to train and run
  • Requires extensive data and expertise
  • Personalized recommendations
  • Financial market analysis
  • Scientific discovery

Applications and Industries

AI models integrated within computer vision, natural language processing, and recommendation systems have significantly impacted businesses across multiple industries. These models help companies recognize objects, classify anomalies, understand written and spoken words, and make recommendations. By accelerating the development of these models, businesses can reap the benefits, regardless of their industry.

Automated Driving Illustration

Automotive

Computer vision models help propel self-driving cars and also help recognize signage, pedestrians, and other vehicles to be avoided. Natural-language processing models can help recognize spoken commands to in-car telematics.

data image

Financial Services

AI-powered anomaly detection helps stop credit-card fraud, while computer vision models watch for suspicious documents including customer checks.

abstract retail image

Retail

Automate checkout lines by recognizing products, or even create autonomous shopping experiences where the models link customers with the items they choose and put into their bags. Use product recommendation engines to offer alternatives, whether online or in the store.

Manufacturing  Gears

Manufacturing

Use computer vision models to monitor quality of manufactured products from food items to printed circuit boards. Feed telemetry data into recommendation engines to suggest proactive maintenance: Are disk drives about to fail? Is the engine using too much oil?

Top view of cardiologist doctor medical healthcare desk

Medical

Detect anomalies including fractures and tumors with computer vision models. Use the same models in research to assess in vitro cell growth and proliferation.

Big data analytics AI technology

Service Automation

Where IT meets customers, natural language processing can help take action based on spoken requests, and recommendation engines can help point customers to satisfactory solutions and product alternatives.

The Ideal choice for Enterprise AI Inference Workloads

Whether deployed as CPU only or used as a host for GPUs executing larger models, AMD EPYC™ 9004 Series processors are designed with the latest open standard technologies to accelerate Enterprise AI inference workloads.

Architected for AI Inference

Up to 128 AMD “Zen 4” Cores with AVX-512 instruction support deliver great parallelism for AI inference workloads, reducing the need for GPU acceleration.

Exceptional Power Efficiency: AMD EPYC processors power the most energy efficient servers, delivering exceptional performance and helping reduce energy costs.1

Fast Processing and I/O: 14% generational increase in instructions per clock cycle, DDR5 memory, and PCIe® Gen 5 I/O for fast data processing.2

AMD EPYC™ 9004 processor

AMD Software Optimizations for AI Inference

Framework Support: AMD supports the most popular AI frameworks, including TensorFlow, PyTorch, and ONNX Runtime, covering diverse use cases like image classification and recommendation engines.

Open Source and Compatibility: Optimizations are integrated into popular frameworks offering broad compatibility and open-source upstream friendliness.  Plus, AMD is working with Hugging Face to enable their open-source models out of the box with ZenDNN.

ZenDNN Plug-in: These plug-ins accelerate AI inference workloads by optimizing operators, leveraging microkernels, and implementing efficient multithreading on AMD EPYC cores.

Image Zoom
AMD Software Optimizations Diagram

Data security is even more important in the era of AI

As the use of digitization, cloud computing, AI, and other emerging technologies fuel the growth of data, the need for advanced security measures becomes even more pressing. This heightened need for security is further amplified by the increased global emphasis on privacy regulations and severe penalties for breaches, highlighting the unparalleled value of data amid rising security risks.

Built-in at the silicon level, AMD Infinity Guard offers the advanced capabilities required to defend against internal and external threats and help keep your data safe.3

Cyber security illustration

AMD EPYC™ 9004 processor-based servers and cloud instances enable fast, efficient AI-enabled solutions close to your customers and data.

 

Small / Medium AI Workload Models

2P Servers running Llama2-7B CHAT-HF and 13B CHAT-HF LLM⁴ (Relative Tokens/Second)
AMD EPYC™ 9654
1.36x
Xeon® Platinum® 8480+
1.0x
2P Servers running Phi-3 Mini (4K)⁵ (Relative Tokens/Second)
AMD EPYC™ 9654
1.24x
Xeon® Platinum® 8592+
1.0x
AWS instance running DLRMv2 at Int8 Precision⁶
4th Gen EPYC (HPC7a.96xl)
~1.44x
4th Gen Xeon (M7i.48xl)
1.0x
AWS instance running MiniLM with PyTorch and Neural Magic Deepsparse engine at FP32 precision⁷
4th Gen EPYC (m7a.48xl)
~1.78x
4th Gen Xeon (M7i.48xl)
1.0x
AWS instance running Llama2-7B at BF16 precision⁸
4th Gen EPYC (m7a.8xl)
~1.19x
4th Gen Xeon (M7i.8xl)
1.0x

Classical ML Workload Models

2P Servers running Clustering FAISS⁹ (Clustering /Second)
AMD EPYC™ 9654
Up to ~2.0x
Xeon® Platinum® 8592+
1.0x
2P Servers running XGBoost with Higgs boson particle explosion¹⁰ (Throughput)
AMD EPYC™ 9654
Up to ~1.7x
Xeon® Platinum® 8592+
1.0x
2P Servers running classification on random decision forests (SciKit-Learning RandomForest airline_ohe throughput)¹¹
AMD EPYC™ 9654
Up to ~1.36x
Xeon® Platinum® 8592+
1.0x
2P Servers running OpenVINO™ Road¹² Segmentation Inference (Frames/Sec per CPU W)
AMD EPYC™ 9754
Up to ~2.4x
Xeon® Platinum® 8592+
1.0x
2P Servers running TPCx-AI @ SF30¹³ (throughput/min)
AMD EPYC™ 9654
Up to ~1.65x
Xeon® Platinum® 8592+
1.0x

Resources

AMD EPYC Enterprise AI Briefs

Find AMD and partner documentation describing AI and Machine Learning Innovation

AMD ZenDNN Library

Open-Source Enhanced Deep Learning Performance on AMD EPYC processors.

Podcasts

Listen to leading technologists from AMD and the industry discussing the latest trending topics regarding servers, cloud computing, AI, HPC and more.

Footnotes
  1. EPYC-028D: SPECpower_ssj® 2008, SPECrate®2017_int_energy_base, and SPECrate®2017_fp_energy_base based on results published on SPEC’s website as of 2/21/24. VMmark® server power-performance / server and storage power-performance (PPKW) based results published at https://www.vmware.com/products/vmmark/results3x.1.html?sort=score. The first 105 ranked SPECpower_ssj®2008 publications with the highest overall efficiency overall ssj_ops/W results were all powered by AMD EPYC processors. For SPECrate®2017 Integer (Energy Base), AMD EPYC CPUs power the first 8 top SPECrate®2017_int_energy_base performance/system W scores. For SPECrate®2017 Floating Point (Energy Base), AMD EPYC CPUs power the first 12 SPECrate®2017_fp_energy_base performance/system W scores. For VMmark® server power-performance (PPKW), have the top 5 results for 2- and 4-socket matched pair results outperforming all other socket results and for VMmark® server and storage power-performance (PPKW), have the top overall score. See https://www.amd.com/en/claims/epyc4#faq-EPYC-028D for the full list. For additional information on AMD sustainability goals see: https://www.amd.com/en/corporate/corporate-responsibility/data-center-sustainability.html. More information about SPEC® is available at http://www.spec.org. SPEC, SPECrate, and SPECpower are registered trademarks of the Standard Performance Evaluation Corporation. VMmark is a registered trademark of VMware in the US or other countries.
  2. EPYC-038: Based on AMD internal testing as of 09/19/2022, geomean performance improvement at the same fixed-frequency on a 4th Gen AMD EPYC™ 9554 CPU compared to a 3rd Gen AMD EPYC™ 7763 CPU using a select set of workloads (33) including est. SPECrate®2017_int_base, est. SPECrate®2017_fp_base, and representative server workloads. SPEC® and SPECrate® are registered trademarks of Standard Performance Evaluation Corporation. Learn more at spec.org.
  3. GD-183A AMD Infinity Guard features vary by EPYC™ Processor generations and/or series. Infinity Guard security features must be enabled by server OEMs and/or Cloud Service Providers to operate. Check with your OEM or provider to confirm support of these features. Learn more about Infinity Guard at https://www.amd.com/en/technologies/infinity-guard
  4. SP5-222: Llama2 tokens/sec workload claim based on AMD internal testing as of 12/1/2023. 2P server configurations: 2P EPYC 9654 (96C/192T), BIOS AMI RTI1001C (NPS=1, Power Determinism, SMT=OFF), Memory: 1.5TB (24x 64GB DDR5-4800), Storage: NVMe 3.2T x 5 + NVMe 1T, OS: Ubuntu 22.04.2 LTS (Linux 5.15.0-84-generic), Software: Python 3.9.18, conda 4.12.0, huggingface-hub 0.17.3, intel-openmp 2023.2.0, mkl 2023.2.0, numpy 1.26.1, sentencepiece 0.1.99, tokenizers 0.14.1 torch 2.1.0+cpu, tpp-pytorch-extension 0.0.1, transformers 4.35.0, running 24 instances scoring up to 27.24 avg. token/sec (Llama2-13B-CHAT-HF, input token size: 8, bfloat16), and up to 52.89 avg. token/sec (Llama2-7B-CHAT-HF, input size: 8, bfloat16), is 1.36x the performance of 2P Xeon Platinum 8480+ (56C/112T), BIOS ESE110Q-1.10 (Profile=Maximum Performance, HT=OFF), 1TB (16x 64GB DDR5-4800), Storage: NVMe 3.2T x 4, OS: Ubuntu 22.04.3 LTS (Linux 5.15.0-88-generic), Software: Python 3.9.18, conda 4.12.0, huggingface-hub 0.17.3, intel-openmp 2023.2.0, mkl 2023.2.0, numpy 1.26.1, sentencepiece 0.1.99, tokenizers 0.14.1 torch 2.1.0+cpu, tpp-pytorch-extension 0.0.1, transformers 4.35.0, running 14 instances scoring up to 20.08 avg. token/sec (Llama2-13B-CHAT-HF, input token size: 8, bfloat16), and up to 38.98 avg. token/sec (Llama2-7B-CHAT-HF, input token size: 8, bfloat16). Results may vary due to factors including system configurations, software versions and BIOS settings.
  5. TBD "SP5-289: Phi-3-mini   throughput results based on AMD internal testing as of 6/10/2024.
    Phi-3-mini configurations: single instance, IPEX 2.3.0, BF16, batch size 1, Input token 16, output token 32.
    Server configurations:
    2P EPYC 9654 (96C/192T), Lenovo ThinkSystem SR665 V3, (SMT=off, NPS=1, Power Determinism, BIOS 1.56), 1.5TB (24x 64GB DDR5-5600 running at 4800 MT/s), 3.2 TB SSD, Ubuntu® 22.04.4 LTS  .
    2P Xeon    Platinum 8592+ (64C/128T), Lenovo ThinkSystem SR650 V3 (HT=off, NPS-1, Turbo Enabled, Profile=Maximum Performance, BIOS ESE122V-3.10), 1TB (16x 64GB DDR5-4800), 3.2TB NVMe, Ubuntu 22.04.4 LTS, AMX on.
    Results, Phi-3-mini 4K:
                   Median Score     Relative to EMR
    Intel 8592+         12.63    1.00
    AMD EPYC 9654                15.68    1.241
    Results, Phi-3-mini 128K
                   Median Score     Relative to EMR
    Intel 8592+         13.92    1
    AMD EPYC 9654                15.21    1.093
  6. SP5C-065: AWS HPC7a.96xlarge average scores and Cloud OpEx savings comparison to M7i. 48xl running Deep Learning Recommendation Model (dlrm-v2.99) with batch size = 2000 at Int8 precision with OneDNN library with IPEX extension using on-demand pricing US-East (Ohio) Linux® as of 6/11/2024 of M7i. 48xl: $9.6768 / hr. HPC7a.96xlarge: $7.20 / hr. AWS pricing: https://aws.amazon.com/ec2/pricing/on-demand/
    Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
  7. SP5C-070: AWS m7a.48xl average score and Cloud OpEx savings comparison to m7i.48xl running HuggingFace's all-MiniLM-L6-v2model on PyTorch and Neural Magic Deepsparse engine with 24 parallel runs of and batch size = 1, input token size = 512, output token size = 128 at FP32 precision, using on-demand pricing US-East (Ohio) Linux® as of 7/15/2024 of m7i.48xl: $9.6768 / hr.  m7a.48xl: $11.12832/ hr.
    AWS Pricing: https://aws.amazon.com/ec2/pricing/on-demand/
    Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system
  8. SP5C-071: AWS M7a.8xl average score and Cloud OpEx savings comparison to m7i.8xl running Llama2 model with 7B parameters at BF16 on a single instance on Intel TPP library with batch size = 4, input token size = 2016, output token size = 256.
    Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
  9. SP5C-060: AWS m7a.4xl average score and Cloud OpEx savings comparison to M7i.4xl running BERT-Large- pruned80_quant-none.vnni model at FP32 with batch size = 1, 128, 256, input token size = 512, output token size = 512 using on-demand pricing US-East (Ohio) Linux® as of 6/11/2024 of M7i.4xl: $0.8064 / hr. M7a.4xl: $0.92736/ hr. Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.
  10. SP5-185A: FAISS v1.7.4 1000 throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/96T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), Samsung MZQL21T9HCJR-00A07 1.92 TB, Ubuntu® 22.04.3 LTS running 8 instances/24 cores/instance scoring 39.6 median throughput is 2.04x the performance of 2P Xeon Platinum 8592+ (64C/64T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 19.4 median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
  11. SP5-251: XGBoost 2.0.3 throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/192T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), Samsung MZQL21T9HCJR-00A07 1.92 TB, Ubuntu 22.04.3 LTS scoring 203 Airline median throughput (running 16 instances/12 cores/instance) and 2057 Higgs median throughput (running 32 instances/6 cores/instance) for 1.38x and 1.71x the performance, respectively, of 2P Xeon Platinum 8592+ (64C/128T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 147 Airline median throughput and 4 instances/32 cores/instance scoring 1200 Higgs median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
  12. SP5-184A: SciKit-Learning Random Forest v2023.2 airline_ohe data set throughput workload claim based on AMD internal testing as of 4/19/2024. 2P server configurations: 2P EPYC 9654 (96C/96T), BIOS 1006C (SMT=off, NPS=1, Power Determinism), 1.5TB (24x 64GB DDR5-4800), 2x Samsung MZQL21T9HCJR-00A07 1.7 TB, Ubuntu® 22.04.3 LTS running 12 instances/16 cores/instance scoring 166.8 median throughput is 1.36x the performance of 2P Xeon Platinum 8592+ (64C/64T), BIOS 1.4.4 (HT=off, Profile=Maximum Performance), 1TB (16x 64GB DDR5-4800), Intel SSDPF2KE032T1O 3.2TB NVMe, Ubuntu 22.04.3 LTS running 8 instances/16 cores/instance scoring 123.1 median throughput. Results may vary due to factors including system configurations, software versions and BIOS settings.
  13. SP5-252: Third-party testing OpenVINO 2023.2.dev FPS comparison based on Phoronix review https://www.phoronix.com/review/intel-xeon-platinum-8592/9 as of 12/14/2023 of select OpenVINO tests: Vehicle Detection FP16, Person Detection FP16, Person Vehicle Bike Detection FP16, Road Segmentation ADAS FP16 and Face Detection Retail FP16. Road Segmentation ADAS FP16 was max uplift of 2.36x. Testing not independently verified by AMD. Scores will vary based on system configuration and determinism mode used (Power Determinism used). OpenVINO is a trademark of Intel Corporation or its subsidiaries.
  14. SP5-051A: TPCx-AI SF30 derivative workload comparison based on AMD internal testing running multiple VM instances as of 4/13/2024. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end AI throughput test results do not comply with the TPCx-AI Specification. AMD system configuration: Processors: 2 x AMD EPYC 9654; Frequencies: 2.4 GHz | 3.7 GHz; Cores: 96 cores per socket (1 NUMA domain per socket); L3 Cache: 384MB/socket (768MB total); Memory: 1.5TB (24) Dual-Rank DDR5-5600 64GB DIMMs, 1DPC (Platform supports up to 4800MHz); NIC: 2 x 100 GbE Mellanox CX-5 (MT28800); Storage: 3.2 TB Samsung MO003200KYDNC U.3 NVMe; BIOS: 1.56; BIOS Settings: SMT=ON, Determinism=Power, NPS=1, PPL=400W, Turbo Boost=Enabled; OS: Ubuntu® 22.04.3 LTS; Test config: 6 instances, 64 vCPUs/instance, 2663 aggregate AI use cases/min vs. Intel system configuration: Processors: 2 x Intel® Xeon® Platinum 8592+; Frequencies: 1.9 GHz | 3.9 GHz; Cores: 64 cores per socket (1 NUMA domain per socket); L3 Cache: 320MB/socket (640MB total); Memory: 1TB (16) Dual-Rank DDR5-5600 64GB DIMMs, 1DPC; NIC: 4 x 1GbE Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe; Storage: 3.84TB KIOXIA KCMYXRUG3T84 NVMe; BIOS: ESE124B-3.11; BIOS Settings: Hyperthreading=Enabled, Turbo boost=Enabled, SNC=Disabled; OS: Ubuntu® 22.04.3 LTS; Test config: 4 instances, 64 vCPUs/instance, 1607 aggregate AI use cases/min. Results may vary due to factors including system configurations, software versions and BIOS settings.  TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.