NEW! AOCL 5.3 is now available, May 18, 2026

AOCL is a set of numerical libraries optimized for AMD processors built on the AMD “Zen” architecture, spanning multiple generations. It supports AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processor families. These highly tuned, industry-standard math libraries accelerate the development of scientific and high-performance computing applications.

Libraries

 

What’s new

AOCL Build-It-Yourself

AOCL now offers the capability to compile individual AOCL libraries and consolidate them into a single unified binary, simplifying integration and deployment.

All AOCL library sources are included as Git submodules within the ‘submodules’ branch. This approach supports offline development and guarantees consistent versioning across all components, making it easier to build and maintain a complete AOCL ecosystem without relying on external dependencies.

Key Highlights

  • Flexible Library Selection
    • Choose one or more AOCL libraries and merge them into a single library using configurable CMake options.
  • Unified Binary Output
    • Linux: libaocl.so / libaocl.a
    • Windows: aocl.dll / aocl.lib
  • Benefits
    • Eliminates dependency on library linking order
    • Prevents API duplication
    • Ensures smooth and efficient integration of multiple AOCL libraries

This enhancement streamlines your development workflow, making AOCL integration easier and more robust.

AOCL-BLAS
  • Performance improvements in S/D/ZGEMM on Zen3/4/5
  • SGEMM optimizations for tiny matrices
  • New Thread Control APIs with Global and thread-local variants
  • Support for OpenMP 2.5 and earlier versions
  • Optional support for reproducibility using compiler options
AOCL-Compression
  • Refactored dynamic dispatch to improve runtime robustness across all x86 systems
  • Added a local thread API enabling applications to control library-internal threads
  • Introduced GNU Make build system support for Linux (with limited test coverage)
  • Added AOCL_LLC_PREFIX user option to prefix library symbols (zlib/deflate, lz4 and zstd) and prevent conflicts with other library implementations
AOCL-Cryptography
  • Added SHA-256 and HMAC-SHA-256 multi-buffer support
  • Added variable-length multi-buffer support for CBC and CFB
  • Performance improvements to ChaCha20, AES-XTS, AES-GCM, and Poly1305
  • Support for OpenSSL 3.5, with enhanced provider support for AES-CBC, CBC TLS unpadding, and RSA operations
  • Reliability, validation, and security fixes across cipher, provider, and RNG paths
AOCL-Data Analytics 
  • Python wheels are now available on PyPI for easy installation via pip
  • Trained models can now be saved in binary and loaded later for inference
  • New APIs for approximate nearest neighbors, radius neighbors classification and radius neighbors regression
  • Performance improvements to support vector machines, DBSCAN, k-means clustering, and train_test_split
AOCL-DLP

AOCL-DLP (Deep Learning Primitives) is an AMD library that provides optimized low-precision GEMM (General Matrix Multiply) and batch GEMM operations for deep learning inference and training on AMD CPUs.

  • Purpose: Accelerates matrix multiplication — the core compute primitive in deep learning — by leveraging AMD-specific instruction sets (AVX2, AVX-512, AVX-512 VNNI, AVX-512 BF16, AVX-512 FP16)
  • Data types: Supports FP32, FP16, BF16, INT8, INT4, and various mixed-precision combinations (for example: BF16×S8, F32×S8 with on-the-fly quantization, BF16×S4/U4 weight-only quantization)
  • Pre/post-ops: Built-in support for zero-point compensation, scale factors, bias, activations (ReLU, PReLU, GELU, Swish, etc.), matrix add/multiply — fused directly into the GEMM kernel to avoid extra memory passes
  • JIT compilation: Runtime code generation using Xbyak to produce optimized kernels tailored to the specific problem size and hardware
  • Threading: OpenMP-based parallel execution with configurable thread-local and library-level threading
  • Quantization: Symmetric and asymmetric quantization support for efficient INT8/INT4 inference workflows

Key Features

  • Added new GEMM APIs for pure FP16, F32×S8, BF16×S8→BF16 (with on-the-fly quantization), and BF16×U4 asymmetric weight-only quantization
  • Delivered full JIT code generation for S8×S8 and U8×S8 GEMM/GEMV paths, including post-ops and column-major support
  • Optimized BF16 and F32 JIT generators with AVX-512 GEMV, RD/k=1 kernel frameworks, BF16×S4 WOQ JIT, and batch GEMM JIT for int8
  • Improved multi-threading with new thread-local/library-level APIs, smart factorization, PGO support, and small matrix optimizations
AOCL-FFTW
  • There are no updates or modifications since version 5.1.
AOCL-FFTZ

AMD’s in-house Fast Fourier Transform (FFT) library is purpose-built and optimized for Zen-based processors, delivering improved computational performance and faster execution of FFT workloads on AMD hardware. The library also provides FFTW-compatible wrappers, enabling seamless integration with applications that already use FFTW APIs.

AOCL‑FFTZ is currently optimized specifically for Vienna Ab initio Simulation Package (VASP) and Quantum ESPRESSO (QE) workloads.

Release Highlights :

  • Added New Complex Radix Kernels – Radix-20 & Radix-48
  • Introduced New Solvers:
    • Complex: Buffered, Split Radix, Batched CT One-level Direct
    • Real: N-Dim, Size-one
  • Enhanced dynamic dispatcher functionality across x86 architectures
  • Performance optimizations in Complex Radix-4 & Radix-12 Kernels
  •  Introduced Fortran2003 FFTW Wrapper for application support
  • Added pkg-config and cmake modules for seamless integration to applications
  • Fixed bugs in MT Batched FFT and memory issue in aoclfftz_execute_io API
AOCL-LAPACK

Performance Improvements

  • QR factorization, Singular Value Decomposition (DGELSS, DORGQR, SGESDD)
  • Matrix Inverse routine DPOTRI for medium sizes

Usability Improvements

All internal code logic updated to use 64-bit integers to extend the range of matrix sizes supported

Test Suite Enhancements

  • Extended BRT test coverage to remaining APIs
  • Introduced separate functional and performance test modes
  • Added API‑specific YAML‑based ctests to improve test coverage
AOCL-LibM
  • Added new statistical functions
    • erfinv, erfcinv, cdfnorm, and cdfnorminv with scalar, vector (vrd2/vrd4/vrd8), and vector array (vrda) variants
  • Added round function support with full vector variant coverage
  • Performance improvements to log2f and round functions
  • Dynamic Dispatch feature update
AOCL-LibMem
  • Added support for new functions: strspn and strnlen
  • Optimized memory and string routines for Zen3/4/5 architectures
  • Introduced DCPerf Benchmark support
  • Integrated GoogleTest (GTest) based validation framework for finer-grained testing
  • Enhanced the validation and benchmarking framework infrastructure
  • Enabled support for microarchitecture-specific build option
AOCL-RNG
  • There are no updates or modifications since version 5.1.
AOCL-ScaLAPACK
  • Compatibility with GCC 15 and AOCC has been enhanced by updating outdated function declarations to modern prototypes
  • Memory allocation limits addressed in the symmetric eigenvalue and eigenvector drivers
  • Minor updates to reduce compiler warnings and promote greater type safety throughout the codebase 
AOCL-Sparse

New Features

  • CSC input support for SYPR, SYRK, SYPRD, and SYRKD
  • Optimal matrix format and ISA selection in SPMV  
  •  Level 1: Fused scatter operations with add and subtract

Performance Improvements

  • SpMV: Fixed AVX512 kernel regression via hidden visibility preset

Bug fixes

  • LP64 integer overflow detection and error reporting across Level 2/3 kernels
  • Coverity high- and medium-severity fixes
  • GCC 15.2 compatibility/fixes: Fixed undefined behavior on empty vectors and SLP vectorizer bug in CSRMM kernel

Documentation

  • Thread-safety notes for matrix modification and hint functions
  • Documented AOCL_ENABLE_INSTRUCTIONS runtime environment variable
AOCL-Utils

GCC style CPUID detection

Download with End User License Agreement

File Name Version Release OS Bitness Description Checksum sha256sum Size
AOCL 5.3 binary packages compiled with AOCC 5.2    
aocl-linux-aocc-5.3.0.tar.gz 5.3 05/18/2026 RHEL, Ubuntu, SLES 64-bit AOCC compiled AOCL tar file containing all the library binaries. It includes install.sh file that extracts and installs the libraries. e23282a94fbeded5bda38a099b61d044d082efc33a06f7b1ab835b36996c089c 338MB
aocl-linux-aocc-5.3.0_1_amd64.deb 5.3 05/18/2026 Ubuntu 64-bit AOCC compiled Debian package 54bf956cf4ccc8a32acf813e26d9cbfa0f9f06bae3415f14601c488f78577379 220MB
aocl-linux-aocc-5.3.0-1.x86_64.rpm 5.3 05/18/2026 RHEL, SLES 64-bit AOCC compiled RPM package 5078091a8e7f2d92f337b7e1a44936eb5d860bc3e4a7fc1febb19a3553fb92bc 261MB
AOCL 5.3 binary packages compiled with GCC 14.2.1    
aocl-linux-gcc-5.3.0.tar.gz 5.3 05/18/2026 RHEL, Ubuntu, SLES 64-bit GCC compiled AOCL tar file containing all the library binaries. Includes install.sh file that extracts and installs the libraries. 0f2454444b2c6f1adda84b4cd0c41eb9ce910a24d9dacbdc3a339d0b628f3b7a 331MB
aocl-linux-gcc-5.3.0_1_amd64.deb 5.3 05/18/2026 Ubuntu 64-bit GCC compiled Debian package 2923cc8e69b53996c48ccd87925a45d538c9b50824513ee7bc0110fa7590f1e3 233MB
aocl-linux-gcc-5.3.0-1.x86_64.rpm 5.3 05/18/2026 RHEL, SLES 64-bit GCC compiled RPM package 13fe10e2ad53a5f4b648825dbd89192125eea0d7b32a054d7eb79e0c7c75a3ee 283MB
Windows Installer Compiled with Clang 19    
AOCL_Windows-setup-5.3.0-AMD.exe 5.3 05/18/2026 Windows 11, Windows 10 64-bit Windows installer file containing all the AOCL library binaries compiled with Clang 19. 021bfd69a439c3c2a72a6b5cf45d1de0e2f0deddb92ba19e73b9caeb259cf9c8 154MB

Resources and Technical Support

Documentation

Support

For support options, refer to Technical Support.

AMD Community

For moderated forums, refer to the AMD community.