Powering the Exascale Era
Shattering research barriers – AMD announces plans to power a new exascale-class supercomputer in collaboration with the U.S. Department of Energy, Oak Ridge National Laboratory, and Cray Inc.
Solutions for High Performance Compute
A new era of heterogeneous compute for Machine Intelligence and HPC has arrived with EPYC™ server processors and Radeon Instinct™ GPU accelerators.
Empowering a new era of scale-out compute for HPC and Deep Learning
Truly accelerating the pace of deep learning and addressing the broad needs of the datacenter requires a combination of high performance compute and GPU acceleration optimized for handling massive amounts of data with heaps of floating-point computation that can be spread across many cores. Large system designers today also need efficient systems with the flexibility and openness to meet the challenge of demanding workloads.
AMD is raising the bar on compute densities by enabling optimized server designs with high performance, low latency, and excellent efficiency in an open, flexible environment. With the introduction of new EPYC processor-based servers with Radeon Instinct GPU accelerators, combined with our ROCm (Radeon Open eCosystem compute platform,) open software ecosystem, AMD is ushering in a new era of heterogeneous compute for HPC and Deep Learning.
University of Notre Dame
The University of Notre Dame Center of Research Computing leverages AMD EPYC™ processors to drive better HPC density and faster results.
Oregon State University
Oregon State University requires high processor core and thread counts for genome-enabled and data-driven research in the life and environmental sciences.
Unleash Discovery on the World’s Fastest Double Precision PCIe® Accelerator¹
The Radeon Instinct™ MI60 compute card is designed to deliver high levels of performance for deep learning, high performance computing (HPC), cloud computing, and rendering systems. This new accelerator is designed with optimized deep learning operations, leading-edge double precision performance1, and hyper-fast HBM2 memory delivering up to 1 TB/s memory bandwidth speeds. Quickly achieve reliable and accurate results in large-scale system deployments with full-chip ECC and RAS capabilities.
Combine this finely balanced and ultra-scalable solution with our ROCm open ecosystem that includes Radeon Instinct optimized drivers, compilers, libraries, and performance tools and you have a solution ready for the next era of compute and machine intelligence.
ROCm – The open software ecosystem for GPU compute.
The ROCm ecosystem delivers an open-source foundation for HPC-class heterogeneous compute and world-class datacenter system designs. The ROCm software design philosophy offers programming choice, minimalism, and a modular software development approach to allow for highly optimized GPU accelerator computing. In addition, AMD hardware virtualized MxGPU technologies can drive higher efficiencies and optimize datacenter utilization.
ROCm foundational elements include:
- Open Headless Linux® 64-bit driver and rich system runtime stack optimized for Hyperscale & HPC-class compute
- Multi-GPU compute supporting in and out of server-node communication through RDMA with direct RDMA peer-sync support in driver
- Simple programming model gives developers control when needed
- HCC true single-source C++ heterogeneous compilers that address the whole system, not just a single device
- HIP CUDA conversion tool provides platform choice
AMD continues to embrace an open approach to extend support of critical features required for NUMA class acceleration to our Radeon™ GPU accelerators for HPC and deep learning deployments.
OpenMP, HIP, OpenCL™, and Python Programming Support
AMD continues to support these standards on our product offerings2. We believe that most people in the HPC community want open standards as the de facto way of running their projects and simulations, and AMD is committed to supporting this goal and is working extensively with the community to drive open standards forward.
- Calculated on Oct 22, 2018, the Radeon Instinct MI60 GPU resulted in 7.4 TFLOPS peak theoretical double precision floating-point (FP64) performance. AMD TFLOPS calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is multiplied by 1/2 FLOPS per clock for FP64. TFLOP calculations for MI60 can be found at https://www.amd.com/en/products/professional-graphics/instinct-mi60 External results on the NVidia Tesla V100 (16GB card) GPU accelerator resulted in 7 TFLOPS peak double precision (FP64) floating-point performance. Results found at: https://images.nvidia.com/content/technologies/volta/pdf/437317-Volta-V100-DS-NV-US-WEB.pdf AMD has not independently tested or verified external/third party results/data and bears no responsibility for any errors or omissions therein. RIV-3
- Some cards may not support all these standards listed. Please refer to the product specs of each card for more details on support.
OpenCL is a trademark of Apple, Inc. used by permission by Khronos Group Inc.