AMD Optimizing CPU Libraries (AOCL)

NEW! AOCL 5.3 is now available, May 18, 2026

AOCL is a set of numerical libraries optimized for AMD processors built on the AMD “Zen” architecture, spanning multiple generations. It supports AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processor families. These highly tuned, industry-standard math libraries accelerate the development of scientific and high-performance computing applications.

Libraries

What’s new

AOCL Build-It-Yourself

AOCL now offers the capability to compile individual AOCL libraries and consolidate them into a single unified binary, simplifying integration and deployment.

All AOCL library sources are included as Git submodules within the ‘submodules’ branch. This approach supports offline development and guarantees consistent versioning across all components, making it easier to build and maintain a complete AOCL ecosystem without relying on external dependencies.

Key Highlights

Flexible Library Selection
- Choose one or more AOCL libraries and merge them into a single library using configurable CMake options.
Unified Binary Output
- Linux: libaocl.so / libaocl.a
- Windows: aocl.dll / aocl.lib
Benefits
- Eliminates dependency on library linking order
- Prevents API duplication
- Ensures smooth and efficient integration of multiple AOCL libraries

This enhancement streamlines your development workflow, making AOCL integration easier and more robust.

GitHub Repo

AOCL-BLAS

Performance improvements in S/D/ZGEMM on Zen3/4/5
SGEMM optimizations for tiny matrices
New Thread Control APIs with Global and thread-local variants
Support for OpenMP 2.5 and earlier versions
Optional support for reproducibility using compiler options

AOCL-Compression

Refactored dynamic dispatch to improve runtime robustness across all x86 systems
Added a local thread API enabling applications to control library-internal threads
Introduced GNU Make build system support for Linux (with limited test coverage)
Added AOCL_LLC_PREFIX user option to prefix library symbols (zlib/deflate, lz4 and zstd) and prevent conflicts with other library implementations

AOCL-Cryptography

Added SHA-256 and HMAC-SHA-256 multi-buffer support
Added variable-length multi-buffer support for CBC and CFB
Performance improvements to ChaCha20, AES-XTS, AES-GCM, and Poly1305
Support for OpenSSL 3.5, with enhanced provider support for AES-CBC, CBC TLS unpadding, and RSA operations
Reliability, validation, and security fixes across cipher, provider, and RNG paths

AOCL-Data Analytics

Python wheels are now available on PyPI for easy installation via pip
Trained models can now be saved in binary and loaded later for inference
New APIs for approximate nearest neighbors, radius neighbors classification and radius neighbors regression
Performance improvements to support vector machines, DBSCAN, k-means clustering, and train_test_split

AOCL-DLP

AOCL-DLP (Deep Learning Primitives) is an AMD library that provides optimized low-precision GEMM (General Matrix Multiply) and batch GEMM operations for deep learning inference and training on AMD CPUs.

Purpose: Accelerates matrix multiplication — the core compute primitive in deep learning — by leveraging AMD-specific instruction sets (AVX2, AVX-512, AVX-512 VNNI, AVX-512 BF16, AVX-512 FP16)
Data types: Supports FP32, FP16, BF16, INT8, INT4, and various mixed-precision combinations (for example: BF16×S8, F32×S8 with on-the-fly quantization, BF16×S4/U4 weight-only quantization)
Pre/post-ops: Built-in support for zero-point compensation, scale factors, bias, activations (ReLU, PReLU, GELU, Swish, etc.), matrix add/multiply — fused directly into the GEMM kernel to avoid extra memory passes
JIT compilation: Runtime code generation using Xbyak to produce optimized kernels tailored to the specific problem size and hardware
Threading: OpenMP-based parallel execution with configurable thread-local and library-level threading
Quantization: Symmetric and asymmetric quantization support for efficient INT8/INT4 inference workflows

Key Features

Added new GEMM APIs for pure FP16, F32×S8, BF16×S8→BF16 (with on-the-fly quantization), and BF16×U4 asymmetric weight-only quantization
Delivered full JIT code generation for S8×S8 and U8×S8 GEMM/GEMV paths, including post-ops and column-major support
Optimized BF16 and F32 JIT generators with AVX-512 GEMV, RD/k=1 kernel frameworks, BF16×S4 WOQ JIT, and batch GEMM JIT for int8
Improved multi-threading with new thread-local/library-level APIs, smart factorization, PGO support, and small matrix optimizations

AOCL-FFTW

There are no updates or modifications since version 5.1.

AOCL-FFTZ

AMD’s in-house Fast Fourier Transform (FFT) library is purpose-built and optimized for Zen-based processors, delivering improved computational performance and faster execution of FFT workloads on AMD hardware. The library also provides FFTW-compatible wrappers, enabling seamless integration with applications that already use FFTW APIs.

AOCL‑FFTZ is currently optimized specifically for Vienna Ab initio Simulation Package (VASP) and Quantum ESPRESSO (QE) workloads.

Release Highlights :

Added New Complex Radix Kernels – Radix-20 & Radix-48
Introduced New Solvers:
- Complex: Buffered, Split Radix, Batched CT One-level Direct
- Real: N-Dim, Size-one
Enhanced dynamic dispatcher functionality across x86 architectures
Performance optimizations in Complex Radix-4 & Radix-12 Kernels
Introduced Fortran2003 FFTW Wrapper for application support
Added pkg-config and cmake modules for seamless integration to applications
Fixed bugs in MT Batched FFT and memory issue in aoclfftz_execute_io API

AOCL-LAPACK

Performance Improvements

QR factorization, Singular Value Decomposition (DGELSS, DORGQR, SGESDD)
Matrix Inverse routine DPOTRI for medium sizes

Usability Improvements

All internal code logic updated to use 64-bit integers to extend the range of matrix sizes supported

Test Suite Enhancements

Extended BRT test coverage to remaining APIs
Introduced separate functional and performance test modes
Added API‑specific YAML‑based ctests to improve test coverage

AOCL-LibM

Added new statistical functions
- erfinv, erfcinv, cdfnorm, and cdfnorminv with scalar, vector (vrd2/vrd4/vrd8), and vector array (vrda) variants
Added round function support with full vector variant coverage
Performance improvements to log2f and round functions
Dynamic Dispatch feature update

AOCL-LibMem

Added support for new functions: strspn and strnlen
Optimized memory and string routines for Zen3/4/5 architectures
Introduced DCPerf Benchmark support
Integrated GoogleTest (GTest) based validation framework for finer-grained testing
Enhanced the validation and benchmarking framework infrastructure
Enabled support for microarchitecture-specific build option

AOCL-RNG

There are no updates or modifications since version 5.1.

AOCL-ScaLAPACK

Compatibility with GCC 15 and AOCC has been enhanced by updating outdated function declarations to modern prototypes
Memory allocation limits addressed in the symmetric eigenvalue and eigenvector drivers
Minor updates to reduce compiler warnings and promote greater type safety throughout the codebase

AOCL-Sparse

New Features

CSC input support for SYPR, SYRK, SYPRD, and SYRKD
Optimal matrix format and ISA selection in SPMV
Level 1: Fused scatter operations with add and subtract

Performance Improvements

SpMV: Fixed AVX512 kernel regression via hidden visibility preset

Bug fixes

LP64 integer overflow detection and error reporting across Level 2/3 kernels
Coverity high- and medium-severity fixes
GCC 15.2 compatibility/fixes: Fixed undefined behavior on empty vectors and SLP vectorizer bug in CSRMM kernel

Documentation

Thread-safety notes for matrix modification and hint functions
Documented AOCL_ENABLE_INSTRUCTIONS runtime environment variable

AOCL-Utils

GCC style CPUID detection

Download with End User License Agreement

File Name	Version	Release	OS	Bitness	Description	Checksum sha256sum	Size
AOCL 5.3 binary packages compiled with AOCC 5.2
aocl-linux-aocc-5.3.0.tar.gz	5.3	05/18/2026	RHEL, Ubuntu, SLES	64-bit	AOCC compiled AOCL tar file containing all the library binaries. It includes install.sh file that extracts and installs the libraries.	e23282a94fbeded5bda38a099b61d044d082efc33a06f7b1ab835b36996c089c	338MB
aocl-linux-aocc-5.3.0_1_amd64.deb	5.3	05/18/2026	Ubuntu	64-bit	AOCC compiled Debian package	54bf956cf4ccc8a32acf813e26d9cbfa0f9f06bae3415f14601c488f78577379	220MB
aocl-linux-aocc-5.3.0-1.x86_64.rpm	5.3	05/18/2026	RHEL, SLES	64-bit	AOCC compiled RPM package	5078091a8e7f2d92f337b7e1a44936eb5d860bc3e4a7fc1febb19a3553fb92bc	261MB
AOCL 5.3 binary packages compiled with GCC 14.2.1
aocl-linux-gcc-5.3.0.tar.gz	5.3	05/18/2026	RHEL, Ubuntu, SLES	64-bit	GCC compiled AOCL tar file containing all the library binaries. Includes install.sh file that extracts and installs the libraries.	0f2454444b2c6f1adda84b4cd0c41eb9ce910a24d9dacbdc3a339d0b628f3b7a	331MB
aocl-linux-gcc-5.3.0_1_amd64.deb	5.3	05/18/2026	Ubuntu	64-bit	GCC compiled Debian package	2923cc8e69b53996c48ccd87925a45d538c9b50824513ee7bc0110fa7590f1e3	233MB
aocl-linux-gcc-5.3.0-1.x86_64.rpm	5.3	05/18/2026	RHEL, SLES	64-bit	GCC compiled RPM package	13fe10e2ad53a5f4b648825dbd89192125eea0d7b32a054d7eb79e0c7c75a3ee	283MB
Windows Installer Compiled with Clang 19
AOCL_Windows-setup-5.3.0-AMD.exe	5.3	05/18/2026	Windows 11, Windows 10	64-bit	Windows installer file containing all the AOCL library binaries compiled with Clang 19.	021bfd69a439c3c2a72a6b5cf45d1de0e2f0deddb92ba19e73b9caeb259cf9c8	154MB

Resources and Technical Support

Documentation

AOCL User Guide
AOCL Release Notes
AOCL API Guide
AOCL Build-It-Yourself Source code: GitHub
Prior versions: AOCL Archive.

Support

For support options, refer to Technical Support.

AMD Community

For moderated forums, refer to the AMD community.

伺服器處理器

商用系統

個人與遊戲

嵌入式產品

資源

加速器

自適應加速器

DPU 加速器

乙太網配接器

工作站

桌上型電腦

筆記型電腦

資源

FPGA 與自適應 SoC

系統模組 (SOM)

技術

開發者資源

評估板與套件

處理器工具

顯示卡工具與應用程式

FPGA 與自適應 SoC 工具

IP 與應用

GPU 加速器工具與應用程式

乙太網路轉接器工具

概述

適用於資料中心與雲端

適用於邊緣與端點-

適用於開發者

行業

行業

行業

行業

Industrias

工作負載

遊戲

系統

技術

資源

EPYC 處理器

Radeon 顯示卡與 AMD 晶片組

FPGA 與自適應 SoC

Alveo 加速器與 Kria SOM

Ryzen 處理器

乙太網配接器

概述

EPYC 處理器

加速器

嵌入式產品

顯示卡

概述

依產品排序資源

依類型排序資源

關於我們的合作夥伴

AMD 全球支援

處理器與顯示卡

加速器

FPGA 與自適應 SoC

遊戲與個人運算

自適應和嵌入式運算

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

NEW! AOCL 5.3 is now available, May 18, 2026

What’s new

AOCL-BLAS

AOCL-Compression

AOCL-Cryptography

AOCL-Data Analytics

AOCL-DLP

AOCL-FFTW

AOCL-FFTZ

AOCL-LAPACK

AOCL-LibM

AOCL-LibMem

AOCL-RNG

AOCL-ScaLAPACK

AOCL-Sparse