NEW! AOCL 5.2 is now available, December 31, 2025

AOCL is a set of numerical libraries optimized for AMD processors built on the AMD “Zen” architecture, spanning multiple generations. It supports AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processor families. These highly tuned, industry-standard math libraries accelerate the development of scientific and high-performance computing applications.

New Libraries Released in AOCL 5.2

Libraries

Build Utility

AOCL-BIY (Build-It-Yourself) – A utility to customize and build AOCL libraries based on user selection.

What’s new in AOCL 5.2
Introducing AOCL-FFTZ (Early Release)

AMD's new in-house Fast Fourier Transform (FFT) library specifically optimized for Zen-based processors improves computational performance for FFT operations and makes applications run faster on AMD hardware.

In its initial release, AOCL-FFTZ is optimized for Vienna Ab initio Simulation Package (VASP) workloads only. Future releases will be packed with further improvements and a full feature set, which would include FFTW wrapper functions, enabling developers to use it as a seamless drop-in replacement for Fastest Fourier Transform in the West (FFTW) and boost application performance without modifying existing code.

With targeted optimizations for the Zen architecture, AOCL-FFTZ enhances performance for scientific computing, signal processing, and data analysis workloads. This initial release delivers significant performance gains for VASP, achieving industry-leading throughput on AMD Zen platforms.

Key Features

  • Single unified library supporting both single and double precision, with compatibility for LP64 and ILP64 data models
  • Supports real and complex forward and backward transforms
  • Optimized core compute kernels with vectorization (AVX128/256/512)
  • Supports multi-threading via OpenMP
  • Dynamic CPU feature detection and runtime dispatch
  • Supports FFTW wrapper (plugin) for seamless transition from FFTW to AOCL-FFTZ
Introducing AOCL-DLP

A highly efficient library of Deep Learning Primitives like low precision GEMM (LPGEMM), BatchGEMM, Quantized GEMM APIs along with fused element-wise operations like GeLU, Sigmoid and more, delivering best performance on AMD EPYC™ CPUs.

AOCL-DLP supports a range of low-precision and mixed-precision data types, including FP32, BF16, and INT8, optimized for AMD x86 architectures. By leveraging advanced instruction set features such as AVX512_VNNI and AVX512_BF16, AOCL-DLP enables highly efficient deep learning computations on AMD EPYC™ CPUs.

Key Features

  • Supports GEMM, BatchGEMM APIs for F32, BF16, INT8 data types
  • Supports symmetric quantized INT8 APIs
  • All APIs support fused elementwise post-ops such as add-bias, ReLU, GeLU-ERF, GeLU-Tanh, Sigmoid, elementwise mat-add and mat-mul
AOCL Build-It-Yourself

AOCL now offers the capability to compile individual AOCL libraries and consolidate them into a single unified binary, simplifying integration and deployment.

All AOCL library sources are included as Git submodules within the ‘submodules’ branch. This approach supports offline development and guarantees consistent versioning across all components, making it easier to build and maintain a complete AOCL ecosystem without relying on external dependencies.

Key Highlights

  • Flexible Library Selection
    • Choose one or more AOCL libraries and merge them into a single library using configurable CMake options.
  • Unified Binary Output
    • Linux: libaocl.so / libaocl.a
    • Windows: aocl.dll / aocl.lib
  • Benefits
    • Eliminates dependency on library linking order
    • Prevents API duplication
    • Ensures smooth and efficient integration of multiple AOCL libraries

This enhancement streamlines your development workflow, making AOCL integration easier and more robust.

AOCL-BLAS

Performance Optimizations

  • GEMM: Significant tuning for Zen4/Zen5, optimized AVX512 kernels, improved edge-case handling, and smarter thread selection
  • GEMV: Introduced multithreaded DGEMV, exported AVX512 kernels, enhanced precision, and resolved key bugs
  • DCOPY: Architecture-specific tuning for Zen4/Zen5

New Features

  • Flexible build options to disable certain optimizations for testing and benchmarking
  • Full implementation of GEMMTR APIs for broader functionality

Reliability & Stability

  • Critical fixes addressing integer overflow, memory safety, and out-of-bound access
  • Improved accuracy for TRSM operations and compatibility with GCC 12+

Quality & Security

  • Standardized codebase (kernel naming, Python 3 compliance)
  • Enhanced security via compiler flags
  • Build system improvements for Windows DLL handling

Testing & Validation

  • Strengthened BLAS2/BLAS3 test coverage and TRSM reference kernel derivation
AOCL-Compression
  • Introduced multi-threaded support for BZIP2 (compression and decompression) and LZMA (compression)
  • Introduced AOCL_COMPRESS_FAST mode in ZSTD for quicker compression at higher levels
  • Improved LZ4 and ZLIB multithreading performance
  • Improved single-threaded performance for ZLIB, BZIP2, ZSTD
AOCL-Cryptography
  • Performance improvements – SHA3, SHA256, and GCM AVX512
  • Multi-buffer support for CBC and CFB
  • In-place buffer support for Ciphers
  • Experimental support for OpenSSL 3.5
  • Clang 19 and GCC 14 support
AOCL-Data Analytics 
  • Python wheels for Windows
  • Python APIs now accept all array-like inputs including Pandas data frames
  • Addition of new APIs for train_test_split and pairwise distances
  • Addition of Ball tree and k-d tree options for DBSCAN clustering and k-nearest neighbors
  • Performance improvements to support vector machines, decision forests, DBSCAN, k-means clustering, linear models, and k-nearest neighbors
AOCL-FFTW
  • There are no updates or modifications since version 5.1.
AOCL-LAPACK

Performance Improvements

  • LU, Cholesky and QR Factorizations (DGETRF, DPOTRF & DGEQRF)
  • Symmetric Eigen Decomposition (DSYEVD)
  • Matrix Inverse routines (DGETRI and DPOTRI) for small sizes

Build System Update

  • AOCL-LAPACK now supports only CMAKE based build; autoconf based build is not supported anymore
  • Introduction of two new modes: avx2-strict and avx512-strict, under LF_ISA_CONFIG build flag to enforce specific ISA during execution

Test Suite Framework Enhancements

  • Bit Reproducibility tests for test-suite supported LAPACK APIs
  • Introduced Benchmark Mode to run tests for fixed duration to display additional metrics and more
AOCL-LibM
  • Upgraded to CMake build system with comprehensive cross-platform support
  • Supports static and dynamic linking options in the CMake build system with static linking being the default option
  • Introduced erfc API with scalar and vector implementations for both single and double precision on Zen platforms
  • Expanded vector functionality for erf, asin, acos, cosh, tan, atan, and tanh functions
AOCL-LibMem
  • Enabled Zen4/Zen5 compilation with AVX-512 flags on GCC versions that do not natively support them
  • Introduced IFUNC-based CPU dispatcher for faster function resolution
  • Optimized memory and string functions for improved performance on Zen5 architecture
  • Enhanced benchmarking and test framework
AOCL-RNG
  • There are no updates or modifications since version 5.1.
AOCL-ScaLAPACK
  • Upgraded AOCL-ScaLAPACK specification to align with Netlib ScaLAPACK 2.2.2
  • Introduced a C wrapper interface for AOCL-ScaLAPACK computational and auxiliary APIs (SRC and TOOLS)
AOCL-Sparse

New Features

  • Supports sparse matrix creation in BSR format
  • Supports CSC and BSR storage formats within SpMV
  • Vectorized SpMV with BSR storage format
  • Supports CPP interfaces for a select list of APIs

Performance Improvements

  • Multi-threaded performance across APIs
  • Level 2: SpMV variants
  • Level 3: CSRMM and Sp2M variants
AOCL-Utils
  • Extended support added for PhoenixPoint CPU
  • Resolved bugs in isZenFamily and au_cpuid_arch_is_zen_family functions

Download with End User License Agreement

File Name Version Release OS Bitness Description Checksum sha256sum Size
AOCL 5.2 binary packages compiled with AOCC 5.1    
aocl-linux-aocc-5.2.0.tar.gz 5.2 12/31/2025 RHEL, Ubuntu, SLES 64-bit AOCC compiled AOCL tar file containing all the library binaries. It includes install.sh file that extracts and installs the libraries.
89084b49f7706ed8fa393bbaac501b2dfc74665f3a6ec6173aafe34a1bd23f32 164MB
aocl-linux-aocc-5.2.0_1_amd64.deb 5.2 12/31/2025 Ubuntu 64-bit AOCC compiled Debian package
9c2712efa356377446131d0246f1d6a6183616429d5a6f4d972216cfc25b9139 96MB
aocl-linux-aocc-5.2.0-1.x86_64.rpm 5.2 12/31/2025 RHEL, SLES 64-bit AOCC compiled RPM package

c6fe0a671831da5f62edad914a775add8d745c5a1100c23907796dd5e271b585 115MB
AOCL 5.2 binary packages compiled with GCC 14.2.1    
aocl-linux-gcc-5.2.0.tar.gz 5.2 12/31/2025 RHEL, Ubuntu, SLES 64-bit GCC compiled AOCL tar file containing all the library binaries. Includes install.sh file that extracts and installs the libraries.
fbad9a554f86130f6442205f9a8e4f3931b254fb2a27575275d3b00979c3804c 171MB
aocl-linux-gcc-5.2.0_1_amd64.deb 5.2 12/31/2025 Ubuntu 64-bit GCC compiled Debian package
7c64d0968e965271f67d77859fd79270aad430e127bf9bf9bf831e9a14e53099 105MB
aocl-linux-gcc-5.2.0-1.x86_64.rpm 5.2 12/31/2025 RHEL, SLES 64-bit GCC compiled RPM package
97ef82f0b4ef6cd6951e5beebd6bba286cbc5481051be41798f6e9e409c14c20 125MB
Windows Installer Compiled with Clang 18    
AOCL_Windows-setup-5.2.0-AMD.exe 5.2 12/31/2025 Windows 11, Windows 10 64-bit Windows installer file containing all the AOCL library binaries compiled with Clang 18.
0caf97420a5e1372aab6c70160f977f4a36c26cee89b88856145af82df2fe096 154MB

Resources and Technical Support

Documentation

Support

For support options, refer to Technical Support.

AMD Community

For moderated forums, refer to the AMD community.