What’s New in AOCL 5.1: Faster, and More Scalable Math Libraries for AMD Platforms

May 21, 2025

$Abstract background$

When developing performance-critical software for AMD Zen-based CPUs, there is no better ally than the AMD Optimizing CPU Libraries (AOCL) suite. A high-performance numerical suite of libraries optimized for AMD Zen-based CPU processors, AOCL spans core domains like linear algebra, Fast Fourier Transforms, random number generation, cryptography, compression, and classical machine learning. Whether you're building HPC simulations, data analytics pipelines, financial or ML workloads, AOCL helps developers deliver faster, more efficient applications by providing finely tuned and scalable building blocks.

We're thrilled to announce the release of AOCL 5.1 that packs a punch by delivering performance optimizations across all libraries, new machine learning capabilities, improved threading and instruction tuning, and expanded support for modern hardware.

Let’s dig into the key highlights of this release. Shape

Linear Algebra: Optimized, Parallelized, and More Scalable

From scientific simulations to AI model training, linear algebra remains the core of most high-performance applications. In AOCL 5.1, we've delivered substantial upgrades across AOCL-BLAS, AOCL-LAPACK, AOCL-Sparse, and AOCL-ScaLAPACK libraries to enhance both performance and scalability.

AOCL-BLAS brings further Zen 4/5-optimized kernels, new batch GEMM APIs in low precision format (LPGEMM), and support for BF16 on AVX2, along with expanded quantization and threading improvements.

AOCL-LAPACK upgrades to Netlib 3.12.0 specification, adds new APIs, improves LU, SVD, and eigenvalue routines, and enhances integration via CMake and pkg-config.

AOCL-Sparse introduces multithreading in CSRMM and SpMV, adds complex data support, and enables dynamic dispatch for better performance on modern cores.

AOCL-ScaLAPACK sees improved eigenvalue solver performance (PDSYEVD) and enhanced test coverage.

With these updates, AOCL offers a comprehensive, efficient toolkit for developers working with dense and sparse matrices—from shared-memory systems to distributed compute environments.

Faster Compression with Parallel Power

AOCL-Compression 5.1 delivers improved single-threaded performance for popular formats like Bzip2, Snappy, Gzip, ZSTD, and LZ4 — especially on Clang-based builds.

The newly introduced multithreaded APIs significantly boosts compression and decompression speeds for GZIP and Raw Deflate formats. Additionally, a new fast mode (AOCL_DECOMPRESS_FAST=3) is included in ZSTD.

Cryptography Tuned for Real-World Workloads

AOCL-Cryptography 5.1 strengthens performance in real benchmarks such as WRK and Apache Bench, particularly for AES-GCM. We’ve further optimized SHA3, improved OpenSSL provider support, added dispatch logic, and fixed key Coverity-detected defects to make this library both faster and more secure.

Runtime tuning via AOCL-Utils and debug logging support round out this release for production-grade cryptographic integration.

Data Analytics Meets Performance

AOCL-DA continues to grow as a drop-in, high-performance alternative for classical machine learning workloads. New features in 5.1 include:

DBSCAN clustering

New support vector machine implementation, with SVC, SVR, nuSVC, nuSVR and a variety of kernel functions

Elastic nets with unscaled step functions

Expanded distance metrics for k-nearest neighbors

Performance boosts in decision trees, random forest, and PCA pipelines

Python APIs, a C/C++ interface, and compatibility with scikit-learn enhances the adoptability of AOCL-DA while bringing in performance benefits of native code.

Additional Enhancements Across the Stack

AOCL-LibM: Better accuracy in special cases and new vector math support. Experimental CMake support to build AOCL-LibM added

AOCL-LibMem: Zen 5-optimized string and memory functions

AOCL-RNG: New AVX512-optimized kernel for double-precision MRG32K3A base generator

AOCL-Utils: Thread-safe logger, hardware detection, and enhanced CMake presets

What’s Next?

While several key improvements in AOCL 5.1 are discussed in this blog, there’s more to come. Over the coming weeks, we’ll be publishing a series of deep-dive technical blogs exploring individual libraries, performance tuning techniques, and integration best practices across AOCL components.

So, whether you’re solving large-scale simulations, training ML models, or optimizing real-time applications, AOCL provides the foundation to get the most out of AMD hardware.

Explore AOCL 5.1 now at the AOCL Developer portal, and stay tuned for detailed insights from our engineering team.

Article By

Pradeep Rao

PMTS Software System Design Eng.

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

What’s New in AOCL 5.1: Faster, and More Scalable Math Libraries for AMD Platforms

Linear Algebra: Optimized, Parallelized, and More Scalable

Faster Compression with Parallel Power

Cryptography Tuned for Real-World Workloads

Data Analytics Meets Performance

Additional Enhancements Across the Stack

What’s Next?

Article By

Related Blogs