NEW! AOCL 5.3 is now available, May 18, 2026
AOCL is a set of numerical libraries optimized for AMD processors built on the AMD “Zen” architecture, spanning multiple generations. It supports AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processor families. These highly tuned, industry-standard math libraries accelerate the development of scientific and high-performance computing applications.
Libraries
- AOCL-BLAS
- AOCL-LAPACK
- AOCL-Compression
- AOCL-Cryptography
- AOCL-Data Analytics
- AOCL-DLP
- AOCL-FFTW (Fastest Fourier Transform in the West)
- AOCL-FFTZ (Fast Fourier Transform for Zen)
- AOCL-LibM (AMD Math Library)
- AOCL-LibMem
- AOCL-RNG (AMD Random Number Generator Library)
- AOCL-SecureRNG (Secure RNG Library)
- AOCL-ScaLAPACK
- AOCL-Sparse
- AOCL-Utils
What’s new
AOCL Build-It-Yourself
AOCL now offers the capability to compile individual AOCL libraries and consolidate them into a single unified binary, simplifying integration and deployment.
All AOCL library sources are included as Git submodules within the ‘submodules’ branch. This approach supports offline development and guarantees consistent versioning across all components, making it easier to build and maintain a complete AOCL ecosystem without relying on external dependencies.
Key Highlights
- Flexible Library Selection
- Choose one or more AOCL libraries and merge them into a single library using configurable CMake options.
- Unified Binary Output
- Linux: libaocl.so / libaocl.a
- Windows: aocl.dll / aocl.lib
- Benefits
- Eliminates dependency on library linking order
- Prevents API duplication
- Ensures smooth and efficient integration of multiple AOCL libraries
This enhancement streamlines your development workflow, making AOCL integration easier and more robust.
AOCL-BLAS
- Performance improvements in S/D/ZGEMM on Zen3/4/5
- SGEMM optimizations for tiny matrices
- New Thread Control APIs with Global and thread-local variants
- Support for OpenMP 2.5 and earlier versions
- Optional support for reproducibility using compiler options
AOCL-Compression
- Refactored dynamic dispatch to improve runtime robustness across all x86 systems
- Added a local thread API enabling applications to control library-internal threads
- Introduced GNU Make build system support for Linux (with limited test coverage)
- Added AOCL_LLC_PREFIX user option to prefix library symbols (zlib/deflate, lz4 and zstd) and prevent conflicts with other library implementations
AOCL-Cryptography
- Added SHA-256 and HMAC-SHA-256 multi-buffer support
- Added variable-length multi-buffer support for CBC and CFB
- Performance improvements to ChaCha20, AES-XTS, AES-GCM, and Poly1305
- Support for OpenSSL 3.5, with enhanced provider support for AES-CBC, CBC TLS unpadding, and RSA operations
- Reliability, validation, and security fixes across cipher, provider, and RNG paths
AOCL-Data Analytics
- Python wheels are now available on PyPI for easy installation via pip
- Trained models can now be saved in binary and loaded later for inference
- New APIs for approximate nearest neighbors, radius neighbors classification and radius neighbors regression
- Performance improvements to support vector machines, DBSCAN, k-means clustering, and train_test_split
AOCL-DLP
AOCL-DLP (Deep Learning Primitives) is an AMD library that provides optimized low-precision GEMM (General Matrix Multiply) and batch GEMM operations for deep learning inference and training on AMD CPUs.
- Purpose: Accelerates matrix multiplication — the core compute primitive in deep learning — by leveraging AMD-specific instruction sets (AVX2, AVX-512, AVX-512 VNNI, AVX-512 BF16, AVX-512 FP16)
- Data types: Supports FP32, FP16, BF16, INT8, INT4, and various mixed-precision combinations (for example: BF16×S8, F32×S8 with on-the-fly quantization, BF16×S4/U4 weight-only quantization)
- Pre/post-ops: Built-in support for zero-point compensation, scale factors, bias, activations (ReLU, PReLU, GELU, Swish, etc.), matrix add/multiply — fused directly into the GEMM kernel to avoid extra memory passes
- JIT compilation: Runtime code generation using Xbyak to produce optimized kernels tailored to the specific problem size and hardware
- Threading: OpenMP-based parallel execution with configurable thread-local and library-level threading
- Quantization: Symmetric and asymmetric quantization support for efficient INT8/INT4 inference workflows
Key Features
- Added new GEMM APIs for pure FP16, F32×S8, BF16×S8→BF16 (with on-the-fly quantization), and BF16×U4 asymmetric weight-only quantization
- Delivered full JIT code generation for S8×S8 and U8×S8 GEMM/GEMV paths, including post-ops and column-major support
- Optimized BF16 and F32 JIT generators with AVX-512 GEMV, RD/k=1 kernel frameworks, BF16×S4 WOQ JIT, and batch GEMM JIT for int8
- Improved multi-threading with new thread-local/library-level APIs, smart factorization, PGO support, and small matrix optimizations
AOCL-FFTW
- There are no updates or modifications since version 5.1.
AOCL-FFTZ
AMD’s in-house Fast Fourier Transform (FFT) library is purpose-built and optimized for Zen-based processors, delivering improved computational performance and faster execution of FFT workloads on AMD hardware. The library also provides FFTW-compatible wrappers, enabling seamless integration with applications that already use FFTW APIs.
AOCL‑FFTZ is currently optimized specifically for Vienna Ab initio Simulation Package (VASP) and Quantum ESPRESSO (QE) workloads.
Release Highlights :
- Added New Complex Radix Kernels – Radix-20 & Radix-48
- Introduced New Solvers:
- Complex: Buffered, Split Radix, Batched CT One-level Direct
- Real: N-Dim, Size-one
- Enhanced dynamic dispatcher functionality across x86 architectures
- Performance optimizations in Complex Radix-4 & Radix-12 Kernels
- Introduced Fortran2003 FFTW Wrapper for application support
- Added pkg-config and cmake modules for seamless integration to applications
- Fixed bugs in MT Batched FFT and memory issue in aoclfftz_execute_io API
AOCL-LAPACK
Performance Improvements
- QR factorization, Singular Value Decomposition (DGELSS, DORGQR, SGESDD)
- Matrix Inverse routine DPOTRI for medium sizes
Usability Improvements
All internal code logic updated to use 64-bit integers to extend the range of matrix sizes supported
Test Suite Enhancements
- Extended BRT test coverage to remaining APIs
- Introduced separate functional and performance test modes
- Added API‑specific YAML‑based ctests to improve test coverage
AOCL-LibM
- Added new statistical functions
- erfinv, erfcinv, cdfnorm, and cdfnorminv with scalar, vector (vrd2/vrd4/vrd8), and vector array (vrda) variants
- Added round function support with full vector variant coverage
- Performance improvements to log2f and round functions
- Dynamic Dispatch feature update
AOCL-LibMem
- Added support for new functions: strspn and strnlen
- Optimized memory and string routines for Zen3/4/5 architectures
- Introduced DCPerf Benchmark support
- Integrated GoogleTest (GTest) based validation framework for finer-grained testing
- Enhanced the validation and benchmarking framework infrastructure
- Enabled support for microarchitecture-specific build option
AOCL-RNG
- There are no updates or modifications since version 5.1.
AOCL-ScaLAPACK
- Compatibility with GCC 15 and AOCC has been enhanced by updating outdated function declarations to modern prototypes
- Memory allocation limits addressed in the symmetric eigenvalue and eigenvector drivers
- Minor updates to reduce compiler warnings and promote greater type safety throughout the codebase
AOCL-Sparse
New Features
- CSC input support for SYPR, SYRK, SYPRD, and SYRKD
- Optimal matrix format and ISA selection in SPMV
- Level 1: Fused scatter operations with add and subtract
Performance Improvements
- SpMV: Fixed AVX512 kernel regression via hidden visibility preset
Bug fixes
- LP64 integer overflow detection and error reporting across Level 2/3 kernels
- Coverity high- and medium-severity fixes
- GCC 15.2 compatibility/fixes: Fixed undefined behavior on empty vectors and SLP vectorizer bug in CSRMM kernel
Documentation
- Thread-safety notes for matrix modification and hint functions
- Documented AOCL_ENABLE_INSTRUCTIONS runtime environment variable
AOCL-Utils
GCC style CPUID detection
Download with End User License Agreement
| File Name | Version | Release | OS | Bitness | Description | Checksum sha256sum | Size |
| AOCL 5.3 binary packages compiled with AOCC 5.2 | |||||||
| aocl-linux-aocc-5.3.0.tar.gz | 5.3 | 05/18/2026 | RHEL, Ubuntu, SLES | 64-bit | AOCC compiled AOCL tar file containing all the library binaries. It includes install.sh file that extracts and installs the libraries. | e23282a94fbeded5bda38a099b61d044d082efc33a06f7b1ab835b36996c089c | 338MB |
| aocl-linux-aocc-5.3.0_1_amd64.deb | 5.3 | 05/18/2026 | Ubuntu | 64-bit | AOCC compiled Debian package | 54bf956cf4ccc8a32acf813e26d9cbfa0f9f06bae3415f14601c488f78577379 | 220MB |
| aocl-linux-aocc-5.3.0-1.x86_64.rpm | 5.3 | 05/18/2026 | RHEL, SLES | 64-bit | AOCC compiled RPM package | 5078091a8e7f2d92f337b7e1a44936eb5d860bc3e4a7fc1febb19a3553fb92bc | 261MB |
| AOCL 5.3 binary packages compiled with GCC 14.2.1 | |||||||
| aocl-linux-gcc-5.3.0.tar.gz | 5.3 | 05/18/2026 | RHEL, Ubuntu, SLES | 64-bit | GCC compiled AOCL tar file containing all the library binaries. Includes install.sh file that extracts and installs the libraries. | 0f2454444b2c6f1adda84b4cd0c41eb9ce910a24d9dacbdc3a339d0b628f3b7a | 331MB |
| aocl-linux-gcc-5.3.0_1_amd64.deb | 5.3 | 05/18/2026 | Ubuntu | 64-bit | GCC compiled Debian package | 2923cc8e69b53996c48ccd87925a45d538c9b50824513ee7bc0110fa7590f1e3 | 233MB |
| aocl-linux-gcc-5.3.0-1.x86_64.rpm | 5.3 | 05/18/2026 | RHEL, SLES | 64-bit | GCC compiled RPM package | 13fe10e2ad53a5f4b648825dbd89192125eea0d7b32a054d7eb79e0c7c75a3ee | 283MB |
| Windows Installer Compiled with Clang 19 | |||||||
| AOCL_Windows-setup-5.3.0-AMD.exe | 5.3 | 05/18/2026 | Windows 11, Windows 10 | 64-bit | Windows installer file containing all the AOCL library binaries compiled with Clang 19. | 021bfd69a439c3c2a72a6b5cf45d1de0e2f0deddb92ba19e73b9caeb259cf9c8 | 154MB |
Resources and Technical Support
Documentation
- AOCL User Guide
- AOCL Release Notes
- AOCL API Guide
- AOCL Build-It-Yourself Source code: GitHub
- Prior versions: AOCL Archive.
Support
For support options, refer to Technical Support.
AMD Community
For moderated forums, refer to the AMD community.