What’s New in AOCL 5.1: Faster, and More Scalable Math Libraries for AMD Platforms

May 21, 2025

Abstract background

When developing performance-critical software for AMD Zen-based CPUs, there is no better ally than the AMD Optimizing CPU Libraries (AOCL) suite. A high-performance numerical suite of libraries optimized for AMD Zen-based CPU processors, AOCL spans core domains like linear algebra, Fast Fourier Transforms, random number generation, cryptography, compression, and classical machine learning. Whether you're building HPC simulations, data analytics pipelines, financial or ML workloads, AOCL helps developers deliver faster, more efficient applications by providing finely tuned and scalable building blocks.

AOCL
AOCL Libraries

We're thrilled to announce the release of AOCL 5.1 that packs a punch by delivering performance optimizations across all libraries, new machine learning capabilities, improved threading and instruction tuning, and expanded support for modern hardware.

Let’s dig into the key highlights of this release.Shape

Linear Algebra: Optimized, Parallelized, and More Scalable

From scientific simulations to AI model training, linear algebra remains the core of most high-performance applications. In AOCL 5.1, we've delivered substantial upgrades across AOCL-BLAS, AOCL-LAPACK, AOCL-Sparse, and AOCL-ScaLAPACK libraries to enhance both performance and scalability.

  • AOCL-BLAS brings further Zen 4/5-optimized kernels, new batch GEMM APIs in low precision format (LPGEMM), and support for BF16 on AVX2, along with expanded quantization and threading improvements.

  • AOCL-LAPACK upgrades to Netlib 3.12.0 specification, adds new APIs, improves LU, SVD, and eigenvalue routines, and enhances integration via CMake and pkg-config.

  • AOCL-Sparse introduces multithreading in CSRMM and SpMV, adds complex data support, and enables dynamic dispatch for better performance on modern cores.

  • AOCL-ScaLAPACK sees improved eigenvalue solver performance (PDSYEVD) and enhanced test coverage.

With these updates, AOCL offers a comprehensive, efficient toolkit for developers working with dense and sparse matrices—from shared-memory systems to distributed compute environments.

Faster Compression with Parallel PowerShape

AOCL-Compression 5.1 delivers improved single-threaded performance for popular formats like Bzip2, Snappy, Gzip, ZSTD, and LZ4 — especially on Clang-based builds. 

The newly introduced multithreaded APIs significantly boosts compression and decompression speeds for GZIP and Raw Deflate formats. Additionally, a new fast mode (AOCL_DECOMPRESS_FAST=3) is included in ZSTD.

Cryptography Tuned for Real-World Workloads

AOCL-Cryptography 5.1 strengthens performance in real benchmarks such as WRK and Apache Bench, particularly for AES-GCM. We’ve further optimized SHA3, improved OpenSSL provider support, added dispatch logic, and fixed key Coverity-detected defects to make this library both faster and more secure.

Runtime tuning via AOCL-Utils and debug logging support round out this release for production-grade cryptographic integration.

AOCL-Cryptography

Data Analytics Meets Performance

AOCL-DA continues to grow as a drop-in, high-performance alternative for classical machine learning workloads. New features in 5.1 include:

  • DBSCAN clustering

  • New support vector machine implementation, with SVC, SVR, nuSVC, nuSVR and a variety of kernel functions

  • Elastic nets with unscaled step functions 

  • Expanded distance metrics for k-nearest neighbors

  • Performance boosts in decision trees, random forest, and PCA pipelines

Python APIs, a C/C++ interface, and compatibility with scikit-learn enhances the adoptability of AOCL-DA while bringing in performance benefits of native code.

ShapeAdditional Enhancements Across the Stack

  • AOCL-LibM: Better accuracy in special cases and new vector math support. Experimental CMake support to build AOCL-LibM added

  • AOCL-LibMem: Zen 5-optimized string and memory functions

  • AOCL-RNG: New AVX512-optimized kernel for double-precision MRG32K3A base generator

  • AOCL-Utils: Thread-safe logger, hardware detection, and enhanced CMake presets 

ShapeWhat’s Next?

While several key improvements in AOCL 5.1 are discussed in this blog, there’s more to come. Over the coming weeks, we’ll be publishing a series of deep-dive technical blogs exploring individual libraries, performance tuning techniques, and integration best practices across AOCL components.

So, whether you’re solving large-scale simulations, training ML models, or optimizing real-time applications, AOCL provides the foundation to get the most out of AMD hardware.

Explore AOCL 5.1 now at the AOCL Developer portal, and stay tuned for detailed insights from our engineering team.

Share:

Article By


PMTS Software System Design Eng.

Related Blogs