AOCL-DLP (Deep Learning Primitives) is a high-performance library that provides optimized deep learning primitives for AMD processors. The library implements Low Precision GEMM (LPGEMM) operations for deep learning applications with support for multiple data types, post-operations, and quantization techniques. Select kernels have been optimized for AMD EPYC™  processors, leveraging AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

AOCL-DLP provides APIs for GEMM operations with various precision formats, comprehensive post-operations for fused computations, batch GEMM support, symmetric quantization routines, and parallel execution via OpenMP.

Highlights of AOCL-DLP 5.2

  • Supports GEMM, BatchGEMM APIs for F32, BF16, INT8 data types
  • Supports symmetric quantized INT8 APIs
  • All APIs support fused elementwise post-operations such as add-bias, ReLU, GeLU (both ERF and tanh variants), Sigmoid, and elementwise matrix addition and multiplication

You can find the package containing AOCL-DLP Library binaries that include optimizations for AMD processors, examples, and documentation in the Downloads section.

Documentation

Downloads

File Name Version Size Launch Date OS Bitness Description
Binary packages compiled with AOCC 5.1
aocl-dlp-linux-aocc-5.2.0.tar.gz 5.2 10MB 12/31/2025 RHEL, Ubuntu, SLES 64-bit AOCC compiled AOCL-DLP library binary package
SHA-256 checksum: f734147fc65518cae199d431cdf435d9545d572acfff0795511830fa9e122f51
Binary packages compiled with GCC 14.2.1
aocl-dlp-linux-gcc-5.2.0.tar.gz 5.2 10MB 12/31/2025 RHEL, Ubuntu, SLES 64-bit GCC compiled AOCL-DLP library binary package
SHA-256 checksum: b5adc2a27422502e06e9c830b6b369d5be5f745f08e0a9472c340fb67ff5f264