AMD EPYC™ Processors help Maximize the Value of Large GPU Investments

GPU accelerators have become the workhorse for modern AI, excelling in training large, complex models and supporting efficient real-time inference at scale. However, maximizing the potential of your GPU investment requires a powerful CPU partner.

Why GPUs for AI Workloads?

GPUs are the right tool for many AI workloads.

  • AI Training: GPUs accelerate the training of large and medium-sized models with their parallel processing capabilities.
  • Dedicated AI Deployments: GPUs offer the speed and scalability needed for real-time inference in large-scale deployments

The CPU Advantage:

Combining the power of GPUs with the right CPU can significantly enhance AI efficiency for certain workloads. Look for these key CPU features:

  • High Frequency EPYC processors: Handles extensive data preparation and post-processing tasks quickly and efficiently.
  • Large Cache Size: facilitates fast data access to massive datasets.
  • High Memory Bandwidth and High Performance I/O: Enables fast seamless data exchange between CPU and GPU.
  • Energy-Efficient Cores: Frees up power for GPU usage and can help reduce overall energy consumption.
  • Compatibility with GPU and Software Ecosystem: Enables optimized performance, efficiency, and smooth operation.
GPU System with AMD EPYC and Instinct

AMD EPYC 9005 Processors

High frequency AMD EPYC 9005 Series processors are an ideal choice for unlocking the true potential of your GPUs for large AI workloads. As host CPU they help ensure the GPUs have the right data at the right time to continue processing is critical to achieving the best AI workload throughput and system efficiency.  Their high core frequency and large memory capacity are key factors that make AMD EPYC high frequency processors stand out. To understand how these key factors deliver increased GPU throughput, read the article.

Applications and Industries

GPU accelerator-based solutions fueled by AMD EPYC CPUs power many of the world's fastest supercomputers and cloud instances, offering enterprises a proven platform for optimizing data-driven workloads and achieving groundbreaking results in AI.

AMD EPYC 9005 Series Processors: The Right Choice to Maximize the Value of Large GPU Investments

CPUs play a crucial role in orchestrating and synchronizing data transfers between GPUs, handling kernel launch overheads, and managing data preparation. This "conductor" function helps GPUs operate at peak efficiency.

Optimize GPU Investment Value With High Performance CPUs

Many AI workloads benefit from high CPU clock speeds to enhance GPU performance by streamlining data processing, transfer, and concurrent execution, fueling GPU efficiency. The EPYC 9575F is purpose-built to be a high performing AI host-node processor running at speeds up to 5GHz.

Comparing 2P servers with 8 GPU Accelerators

AMD vs. Intel Host Node CPU with AMD Instinct GPUs
MLPerf® v4.1 Inference Llama 2-70B Benchmark¹
8x AMD Instinct™ MI300X + 2P EPYC 5th Gen 9575F (64 cores – 5GHz)
1.11x
8x AMD Instinct™ MI300X + 2P Xeon 8460Y+ (40 cores – 3.7GHz)
1.0x

Inference - Llama3.1-70B Inference Benchmark (BF16)²
8x Nvidia H100 + 2P EPYC 5th Gen 9575F (64 cores)
~1.20x
8x Nvidia H100 + 2P Xeon 8592+ (64 cores)
1.0x
Training - Llama3.1-8B Inference Benchmark (FP8)³
8x Nvidia H100 + 2P EPYC 5th Gen 9575F (64 cores)
~1.15x
8x Nvidia H100 + 2P Xeon 8592+ (64 cores)
1.0x

Deploy Enterprise AI Efficiently

Processors like 5th Gen AMD EPYC that combine high performance, low power consumption, efficient data handling, and effective power management capabilities enable your AI infrastructure to operate at peak performance while optimizing energy consumption and cost.

AMD EPYC processors power energy-efficient servers, delivering exceptional performance and helping reduce energy costs. Deploy them with confidence to create energy-efficient solutions and help optimize your AI journey.

In AMD EPYC 9005 Series processors AMD Infinity Power Management offers excellent default performance and allows fine-tuning for workload-specific behavior.

Abstract illustration with glowing blue lines

Peace of Mind: Adopt AI With Trusted Solutions

Choose from several certified or validated GPU-accelerated solutions hosted by AMD EPYC CPUs to supercharge your AI workloads.

Using other GPUs? Ask for AMD EPYC CPU powered solutions available from leading platform solution providers including Asus, Dell, Gigabyte, HPE, Lenovo, and Supermicro.

Growing Ecosystem of AMD EPYC CPU + GPU Cloud AI/ML Instance Options

Ask for instances combining AMD EPYC CPU with GPUs for AI/ML workloads from major cloud  providers including AWS, Azure, Google, IBM Cloud, and OCI.

server room photo

Resources

AMD Instinct Accelerators

Uniquely well-suited to advance your most demanding AI workloads.

AMD EPYC Enterprise AI Briefs

Find AMD and partner documentation describing AI and Machine Learning Innovation using CPUs and GPUs

Podcasts

Listen to leading technologists from AMD and the industry discussing the latest trending topics regarding servers, cloud computing, AI, HPC and more.

Footnotes
  1. 9xx5-013: Official MLPerf™ Inference score v4.1 Llama2-70B-99.9 server tokens/s and offline tokens/s results retrieved from https://mlcommons.org/benchmarks/inference-datacenter/ on 09/01/2024, from the following entries: 4.1-0070 (preview) and 4.1.0022. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
  2. 9xx5-014: Llama3.1-70B inference throughput results based on AMD internal testing as of 09/01/2024. Llama3.1-70B configurations: TensorRT-LLM 0.9.0, nvidia/cuda 12.5.0-devel-ubuntu22.04  , FP8, Input/Output token configurations (use cases): [BS=1024 I/O=128/128, BS=1024 I/O=128/2048, BS=96 I/O=2048/128, BS=64 I/O=2048/2048]. Results in tokens/second.2P AMD EPYC 9575F    (128 Total Cores  ) with 8x NVIDIA H100 80GB HBM3, 1.5TB 24x64GB DDR5-6000, 1.0 Gbps 3TB Micron_9300_MTFDHAL3T8TDP NVMe®, BIOS T20240805173113 (Determinism=Power,SR-IOV=On), Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic (mitigations=off, cpupower frequency-set -g performance, cpupower idle-set -d 2, echo 3> /proc/syss/vm/drop_caches) , 2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1TB 16x64GB DDR5-5600, 3.2TB Dell Ent NVMe® PM1735a MU, Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic, (processor.max_cstate=1, intel_idle.max_cstate=0 mitigations=off, cpupower frequency-set -g performance), BIOS 2.1, (Maximum performance, SR-IOV=On), I/O Tokens Batch Size EMR Turin Relative 128/128 1024 814.678 1101.966 1.353 128/2048 1024 2120.664 2331.776 1.1 2048/128 96 114.954 146.187 1.272 2048/2048 64 333.325 354.208 1.063For average throughput increase of 1.197x. Results may vary due to factors including system configurations, software versions and BIOS settings.
  3. 9xx5-015: Llama3.1-8B (BF16, max sequence length 1024) training testing results based on AMD internal testing as of 09/05/2024. Llama3.1-8B configurations: Max Sequence length 1024, BF16, Docker: huggingface/transformers-pytorch-gpu:latest 2P AMD EPYC 9575F    (128 Total Cores ) with 8x NVIDIA H100 80GB HBM3, 1.5TB 24x64GB DDR5-6000, 1.0 Gbps 3TB Micron_9300_MTFDHAL3T8TDP NVMe®, BIOS T20240805173113 (Determinism=Power,SR-IOV=On), Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic (mitigations=off, cpupower frequency-set -g performance, cpupower idle-set -d 2, echo 3> /proc/syss/vm/drop_caches) , For 31.79 Train Samples/Second2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1TB 16x64GB DDR5-5600, 3.2TB Dell Ent NVMe® PM1735a MU, Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic, (processor.max_cstate=1, intel_idle.max_cstate=0 mitigations=off, cpupower frequency-set -g performance  ), BIOS 2.1, (Maximum performance, SR-IOV=On), For 27.74 Train Samples/SecondFor average throughput increase of 1.146.  Results may vary due to factors including system configurations, software versions and BIOS settings.