Accelerators for High Performance Compute


A new era of heterogeneous compute for Machine Intelligence and HPC has arrived with EPYC™ server processors and Radeon Instinct™ GPU accelerators.

AMD EPYC™ Radeon Instinct™

Empowering a new era of scale-out compute for HPC and Deep Learning

Truly accelerating the pace of deep learning and addressing the broad needs of the datacenter requires a combination of high performance compute and GPU acceleration optimized for handling massive amounts of data with lots of floating point computation that can be spread across many cores. Large system designers today also need the ability design efficient systems with the flexibility and openness to configure systems that meet the challenge of today’s very demanding workloads.

AMD is empowering designers with those capabilities, allowing them to raise the bar on achievable compute densities by enabling optimized server designs with higher performance, reduced latencies and improved efficiencies in an open, flexible environment. With the introduction of new EPYC processor based servers with Radeon Instinct GPU accelerators, combined with our ROCm open software platform, AMD is ushering in a new era of heterogeneous compute for HPC and Deep Learning.

Radeon Instinct™ MI25 Server Accelerators

AMD is changing the game with the introduction of its open standards-based Radeon Instinct family of products. Radeon Instinct accelerators, combined with our open ecosystem approach to heterogeneous compute, raises the bar on achievable performance, efficiencies and the flexibility needed to design systems capable of meeting the challenges of today’s data-centric workloads.

The new Radeon Instinct MI25 accelerator, based on AMD’s Next-Gen “Vega” architecture, with its powerful parallel compute engine, is the world’s ultimate training accelerator for large scale deep learning applications and is a workhorse for HPC workloads delivering 24.6 TFLOPS of FP16 and 12.3 TFLOPS of FP32 peak floating-point performance.1 Combine this power with the open ROCm software platform and the world’s most advanced GPU memory architecture, 16GB of HBM2, and up to 484 GB/s of memory bandwidth, and you get the ultimate solution for today’s compute workloads.

Radeon Instinct MI25 Highlights:

  • Built-on AMD’s Next-Gen “Vega” architecture with world’s most advanced GPU memory architecture
  • Superior FP16 and FP32 performance for HPC and Deep Learning
  • ROCm open software platform for HPC-class rack scale
  • Large BAR support for mGPU peer to peer
  • MxGPU hardware technologies for optimized datacenter utilization

Superior compute density and performance per node when combining new AMD EPYC™ processor-based servers and Radeon Instinct MI25 accelerators


Learn more


ROCm Open Software Platform

The ROCm open software platform delivers an open-source foundation for HPC-class heterogeneous compute and world-class datacenter system designs. The ROCm platform provides performance optimized Linux® drivers, compilers, tools and libraries. ROCm’s software design philosophy offers programing choice, minimalism and a modular software development approach to allow for more optimized GPU accelerator computing.
Combined this approach with AMD’s secure hardware virtualized MxGPU technologies, and system designers are now enabled to change how they design systems to achieve higher efficiencies and to drive optimized datacenter utilization and capacities.

ROCm foundational elements:

HSA Foundation logo ​ ROCm Platform logo
  • Open Headless Linux® 64-bit driver and rich system runtime stack optimized for Hyperscale & HPC-class compute
  • Multi-GPU compute supporting in and out of server-node communication through RDMA with direct RDMA peer-sync support in driver
  • Simpler programming model giving developers control when needed
  • HCC true single-source C++ heterogeneous compilers addressing whole system not just a single device
  • HIP CUDA conversion tool providing platform choice for using GPU computing API

The ROCm open software platform provides a solid foundation for large scale Machine Intelligent and HPC datacenter deployments with an optimized open Linux driver and rich ROCr System Runtime which is language independent and makes heavy use of the Heterogeneous System Architecture (HSA) Runtime API. This provides a rich foundation to execute programming languages such as HCC C++, Khronos Group’s OpenCL™, Continuum’s Anaconda Python and the HIP CUDA conversion tool.2

AMD continues to embrace an open approach to extend support of critical features required for NUMA class acceleration to our Radeon™ GPU accelerators for HPC and deep learning deployments, and the ROCm platform now supports our new Radeon Instinct GPU accelerator family of products, as well as continued support for a number of our other AMD FirePro™ S Series, Radeon™ RX Series, and Radeon™ Pro Duo graphics cards. Please visit the ROCm web site for a full list of supported GPU cards. 

OpenCL™, OpenMP and OpenACC Support


AMD continues to support these standards on our product offerings3. We believe that most people in the HPC community want open standards as the de facto way of running their projects and simulations, and AMD is committed to supporting this goal and is working extensively with the community to drive open standards forward.

 AMD FirePro™ S-Series Accelerators

AMD FirePro™ S9300 x2  

​AMD FirePro™ S9300 x2 Accelerator The World’s First GPU Accelerator with 1TB/s Memory Bandwidth Accelerate your most complex HPC workloads in data analytics or seismic processing on the world’s fastest single-precision compute GPU accelerator, the AMD FirePro™ S9300 x2 Server GPU.4,5 Take advantage of the numerous tools and libraries available at your disposal, including ROCm tools, from our developer page at

A recent test was undertaken by one of our customers, CGG. CGG is a leader in cutting-edge geoscience and recently conducted proprietary wave equation modelling benchmarking on several different GPU accelerators, including the new AMD FirePro™ S9300 x2 GPU. As the complexity of the wave equation increased, the performance advantage also grew in favor of the AMD FirePro™ S9300 x2 GPU, to a point where it was 2x faster than any other card tested.6

Wave Table 

Chart Provided by CGG

AMD FirePro™ S9100, S9150 and S9170 Accelerators

Those who are looking for great double precision performance can turn to the AMD FirePro™ S9100 series of accelerators. The AMD FirePro™ S9150, powering the #1 ranked supercomputer on the 2014 Green500 list, easily surpasses the competition by offering over 50% more double precision performance than the comparable Tesla K40.7

Watch the video interview of Dr. David Rohr and Professor Lindenstruth talking about the L-CSC cluster, #1 ranked supercomputer on the 2014 Green500.



DGEMM, or Double-precision General Matrix-Matrix multiplication, measures floating point execution rate for double precision, real matrix-matrix multiplication. There are many real-world applications that take advantage of double-precision matrix operations. These include computational fluid dynamics, finite element analysis and structural modelling, and molecular dynamics.

With our AMD OpenCL BLAS implementation, we are able to achieve 2 TFLOPS of sustained DGEMM performance with the AMD FirePro™ S9150 GPU, while the Tesla K40 achieves 1.3 TFLOPS DGEMM.

The AMD FirePro™ S9170 GPU is great for those who need large matrix-matrix multiplication capabilities, where one can take advantage of the large 32GB GDDR5 memory that this card possesses. The Nvidia K80 and K40, with 24GB and 12GB memory, respectively, cannot compute matrices that are larger than what their smaller onboard memory can handle.

 AMD FirePro™ S-Series Specifications

AMD FirePro™ Server GPU Solutions for High Performance Compute

From academic research in computational fluid dynamics, to oil and gas industries looking into seismic processing and reservoir simulation, AMD FirePro™ S-Series server GPUs provides a complete product stack that can cater to practically any of your needs. With cutting-edge single-precision and double-precision compute performance, AMD FirePro server GPUs are the solution for any computationally complex project requiring the massive parallel processing capabilities of a GPU.3

​S9100​S9150​S9170S9300 x2
AMD FirePro™ S9100AMD FirePro™ S9150AMD FirePro™ S9170 AMD FirePro™ S9300 x2
​GCN Stream Processors​2560​2816​28168192
​Single-Precision (GFLOPS)​4220​5070​5240​13900
​Double-Precision (GFLOPS)​2110​2530​2620​870
​On-board Memory​12GB GDDR5​16GB GDDR5​32GB GDDR5​8GB HBM
​ECC​Yes (external)​Yes (external)​Yes (external)​No
​Memory Bandwidth (GB/s)​320​320​320​1024
​Interface​PCIe 3.0, Dual slot​PCIe 3.0, Dual slot​PCIe 3.0, Dual slot​PCIe 3.0, Dual slot
​Max Power​225W​235W​275W​300W
​Cooling​Passive Heatsink​Passive Heatsink​Passive Heatsink​Passive Heatsink
​Recommended for

​Double precision workflows such as:

Academic and Government Clusters

Oil & Gas – reservoir simulation

​Double precision workflows such as:

Academic and Government Clusters

Oil & Gas – reservoir simulation

​Double precision workflows such as:

Academic and Government Clusters

Oil & Gas – reservoir simulation

​Single-precision workloads such as:

Molecular Dynamics


Deep Neural Networks/Machine Learning


 Where to Buy

The AMD FirePro™ accelerators are available from a number of OEMs and SI’s, including Dell, HPE and SuperMicro, amongst others.

Dell ​ ​ ​ ​​ ​HPE
 ​ ​ ​ ​​  ​ ​ ​ ​​  ​ ​ ​ ​​  ​ ​ ​ ​​ SuperMicro ​ ​ ​ ​​ 

For more information on AMD FirePro™ GPU-equipped Dell servers, visit

For AMD FirePro™ GPU-equipped HPE servers, visit