What’s New in AOCL 5.1: Faster, and More Scalable Math Libraries for AMD Platforms
May 21, 2025

When developing performance-critical software for AMD Zen-based CPUs, there is no better ally than the AMD Optimizing CPU Libraries (AOCL) suite. A high-performance numerical suite of libraries optimized for AMD Zen-based CPU processors, AOCL spans core domains like linear algebra, Fast Fourier Transforms, random number generation, cryptography, compression, and classical machine learning. Whether you're building HPC simulations, data analytics pipelines, financial or ML workloads, AOCL helps developers deliver faster, more efficient applications by providing finely tuned and scalable building blocks.


We're thrilled to announce the release of AOCL 5.1 that packs a punch by delivering performance optimizations across all libraries, new machine learning capabilities, improved threading and instruction tuning, and expanded support for modern hardware.
Let’s dig into the key highlights of this release.
Linear Algebra: Optimized, Parallelized, and More Scalable
From scientific simulations to AI model training, linear algebra remains the core of most high-performance applications. In AOCL 5.1, we've delivered substantial upgrades across AOCL-BLAS, AOCL-LAPACK, AOCL-Sparse, and AOCL-ScaLAPACK libraries to enhance both performance and scalability.

AOCL-BLAS brings further Zen 4/5-optimized kernels, new batch GEMM APIs in low precision format (LPGEMM), and support for BF16 on AVX2, along with expanded quantization and threading improvements.
AOCL-LAPACK upgrades to Netlib 3.12.0 specification, adds new APIs, improves LU, SVD, and eigenvalue routines, and enhances integration via CMake and pkg-config.
AOCL-Sparse introduces multithreading in CSRMM and SpMV, adds complex data support, and enables dynamic dispatch for better performance on modern cores.
AOCL-ScaLAPACK sees improved eigenvalue solver performance (PDSYEVD) and enhanced test coverage.
With these updates, AOCL offers a comprehensive, efficient toolkit for developers working with dense and sparse matrices—from shared-memory systems to distributed compute environments.
Faster Compression with Parallel Power
AOCL-Compression 5.1 delivers improved single-threaded performance for popular formats like Bzip2, Snappy, Gzip, ZSTD, and LZ4 — especially on Clang-based builds.
The newly introduced multithreaded APIs significantly boosts compression and decompression speeds for GZIP and Raw Deflate formats. Additionally, a new fast mode (AOCL_DECOMPRESS_FAST=3) is included in ZSTD.

Cryptography Tuned for Real-World Workloads
AOCL-Cryptography 5.1 strengthens performance in real benchmarks such as WRK and Apache Bench, particularly for AES-GCM. We’ve further optimized SHA3, improved OpenSSL provider support, added dispatch logic, and fixed key Coverity-detected defects to make this library both faster and more secure.
Runtime tuning via AOCL-Utils and debug logging support round out this release for production-grade cryptographic integration.

Data Analytics Meets Performance
AOCL-DA continues to grow as a drop-in, high-performance alternative for classical machine learning workloads. New features in 5.1 include:
DBSCAN clustering
New support vector machine implementation, with SVC, SVR, nuSVC, nuSVR and a variety of kernel functions
Elastic nets with unscaled step functions
Expanded distance metrics for k-nearest neighbors
Performance boosts in decision trees, random forest, and PCA pipelines
Python APIs, a C/C++ interface, and compatibility with scikit-learn enhances the adoptability of AOCL-DA while bringing in performance benefits of native code.
Additional Enhancements Across the Stack
AOCL-LibM: Better accuracy in special cases and new vector math support. Experimental CMake support to build AOCL-LibM added
AOCL-LibMem: Zen 5-optimized string and memory functions
AOCL-RNG: New AVX512-optimized kernel for double-precision MRG32K3A base generator
AOCL-Utils: Thread-safe logger, hardware detection, and enhanced CMake presets
What’s Next?
While several key improvements in AOCL 5.1 are discussed in this blog, there’s more to come. Over the coming weeks, we’ll be publishing a series of deep-dive technical blogs exploring individual libraries, performance tuning techniques, and integration best practices across AOCL components.
So, whether you’re solving large-scale simulations, training ML models, or optimizing real-time applications, AOCL provides the foundation to get the most out of AMD hardware.
Explore AOCL 5.1 now at the AOCL Developer portal, and stay tuned for detailed insights from our engineering team.
