Accelerators for High Performance Compute
A new era of heterogeneous compute for Machine Intelligence and HPC has arrived with EPYC™ server processors and Radeon Instinct™ GPU accelerators.
Empowering a new era of scale-out compute for HPC and Deep Learning
Truly accelerating the pace of deep learning and addressing the broad needs of the datacenter requires a combination of high performance compute and GPU acceleration optimized for handling massive amounts of data with lots of floating point computation that can be spread across many cores. Large system designers today also need the ability design efficient systems with the flexibility and openness to configure systems that meet the challenge of today’s very demanding workloads.
AMD is empowering designers with those capabilities, allowing them to raise the bar on achievable compute densities by enabling optimized server designs with higher performance, reduced latencies and improved efficiencies in an open, flexible environment. With the introduction of new EPYC processor based servers with Radeon Instinct GPU accelerators,combined with our ROCm open software platform, AMD is ushering in a new era of heterogeneous compute for HPC and Deep Learning.
Radeon Instinct™ MI25 Server Accelerators
AMD is changing the game with the introduction of its open standards-based Radeon Instinct family of products. Radeon Instinct accelerators, combined with our open ecosystem approach to heterogeneous compute, raises the bar on achievable performance, efficiencies and the flexibility needed to design systems capable of meeting the challenges of today’s data-centric workloads.
The new Radeon Instinct MI25 accelerator, based on AMD’s Next-Gen “Vega” architecture, with its powerful parallel compute engine, is the world’s ultimate training accelerator for large scale deep learning applications and is a workhorse for HPC workloads delivering 24.6 TFLOPS of FP16 and 12.3 TFLOPS of FP32 peak floating-point performance.1 Combine this power with the open ROCm software platform and the world’s most advanced GPU memory architecture, 16GB of HBM2, and up to 484 GB/s of memory bandwidth, and you get the ultimate solution for today’s compute workloads.
Radeon Instinct MI25 Highlights:
- Built-on AMD’s Next-Gen “Vega” architecture with world’s most advanced GPU memory architecture
- Superior FP16 and FP32 performance for HPC and Deep Learning
- ROCm open software platform for HPC-class rack scale
- Large BAR support for mGPU peer to peer
- MxGPU hardware technologies for optimized datacenter utilization
Superior compute density and performance per node when combining new AMD EPYC™ processor-based servers and Radeon Instinct MI25 accelerators
EPYC™ Memory Bound HPC Performance
The AMD EPYC processor provides excellent performance for memory bound HPC workloads.
ROCm Open Software Platform
The ROCm open software platform delivers an open-source foundation for HPC-class heterogeneous compute and world-class datacenter system designs. The ROCm platform provides performance optimized Linux® drivers, compilers, tools and libraries. ROCm’s software design philosophy offers programing choice, minimalism and a modular software development approach to allow for more optimized GPU accelerator computing.
Combined this approach with AMD’s secure hardware virtualized MxGPU technologies, and system designers are now enabled to change how they design systems to achieve higher efficiencies and to drive optimized datacenter utilization and capacities.
ROCm foundational elements:
- Open Headless Linux® 64-bit driver and rich system runtime stack optimized for Hyperscale & HPC-class compute
- Multi-GPU compute supporting in and out of server-node communication through RDMA with direct RDMA peer-sync support in driver
- Simpler programming model giving developers control when needed
- HCC true single-source C++ heterogeneous compilers addressing whole system not just a single device
- HIP CUDA conversion tool providing platform choice for using GPU computing API
The ROCm open software platform provides a solid foundation for large scale Machine Intelligent and HPC datacenter deployments with an optimized open Linux driver and rich ROCr System Runtime which is language independent and makes heavy use of the Heterogeneous System Architecture (HSA) Runtime API. This provides a rich foundation to execute programming languages such as HCC C++, Khronos Group’s OpenCL™, Continuum’s Anaconda Python and the HIP CUDA conversion tool.2
AMD continues to embrace an open approach to extend support of critical features required for NUMA class acceleration to our Radeon™ GPU accelerators for HPC and deep learning deployments, and the ROCm platform now supports our new Radeon Instinct GPU accelerator family of products, as well as continued support for a number of our other AMD FirePro™ S Series, Radeon™ RX Series, and Radeon™ Pro Duo graphics cards. Please visit the ROCm web site for a full list of supported GPU cards.
OpenCL™, OpenMP and OpenACC Support
AMD continues to support these standards on our product offerings3. We believe that most people in the HPC community want open standards as the de facto way of running their projects and simulations, and AMD is committed to supporting this goal and is working extensively with the community to drive open standards forward.
- TFLOPS calculations: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used. The FP64 TFLOPS rate is calculated using 1/16th rate.
- Support for Python is planned, but still under development.
- Some S-Series cards may not support all of the standards listed. Please refer to the product specs of each card for more details on supported APIs