AOCL-DLP (Deep Learning Primitives) is a high-performance library that provides optimized deep learning primitives for AMD processors. The library implements Low Precision GEMM (LPGEMM) operations for deep learning applications with support for multiple data types, post-operations, and quantization techniques. Select kernels have been optimized for AMD EPYC™  processors, leveraging AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

AOCL-DLP provides APIs for GEMM operations with various precision formats, comprehensive post-operations for fused computations, batch GEMM support, symmetric quantization routines, and parallel execution via OpenMP.

Highlights of AOCL-DLP 5.3

  • Added new GEMM APIs for pure FP16, F32×S8, BF16×S8→BF16 (with on-the-fly quantization), and BF16×U4 asymmetric weight-only quantization
  • Delivered full JIT code generation for S8×S8 and U8×S8 GEMM/GEMV paths, including post-ops and column-major support
  • Optimized BF16 and F32 JIT generators with AVX-512 GEMV, RD/k=1 kernel frameworks, BF16×S4 WOQ JIT, and batch GEMM JIT for int8
  • Improved multi-threading with new thread-local/library-level APIs, smart factorization, PGO support, and small matrix optimizations

You can find the package containing AOCL-DLP Library binaries that include optimizations for AMD processors, examples, and documentation in the Downloads section.

Documentation

Downloads

File Name Version Size Launch Date OS Bitness Description
Binary packages compiled with AOCC 5.2
aocl-dlp-linux-aocc-5.3.0.tar.gz 5.3 11MB 05/18/2026 RHEL, Ubuntu, SLES 64-bit AOCC compiled AOCL-DLP library binary package SHA-256 checksum:
 f736c3094edbfb25bff99649c526d8aec1a253b9dc18ba7608a5c7fe1443bfd5
Binary packages compiled with GCC 14.2.1
aocl-dlp-linux-gcc-5.3.0.tar.gz 5.3 9.7MB 05/18/2026 RHEL, Ubuntu, SLES 64-bit GCC compiled AOCL-DLP library binary package SHA-256 checksum: 
2f6f5bce92095e78b4e8529c8e5804c3d519f071d253bcb39c8273c91249b133