AMD Optimizing CPU Libraries (AOCL)

AMD Zen Software Studio with Spack

Open MPI with AMD Zen Software Studio

Micro Benchmarks/Synthetic Benchmarks

Spack HPC Applications

Introduction

AOCL is a set of numerical libraries optimized for AMD processors based on the AMD “Zen” core architecture and generations. Supported processor families are AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processors. The tuned implementations of industry-standard math libraries enable rapid development of scientific and high-performance computing applications.

Official Website: https://www.amd.com/en/developer/aocl.html

The following AOCL libraries are supported with Spack:

amdblis
amdlibflame
amdfftw
amdscalapack
amdlibm
aocl-sparse
aocl-utils
aocl-crypto
aocl-libmem
aocl-compression
aocl-da
aocl-dlp

Note: Users can install the above libraries individually, or as a bundle using the amd-aocl package.

Spack is designed to automatically resolve library dependencies when installing HPC applications, therefore it is not necessary to explicitly install AMD libraries ahead of time. Instead, to ensure your application is built with all supported AMD Optimized libraries, Spack should be configured to always prefer AMD AOCL libraries.

Preferring AMD AOCL Packages

To configure Spack to select AMD AOCL packages by default for linear algebra and other functions for the specified library version, you need to edit the packages.yaml file.

For example, if you are using the latest version, 5.2, the contents of packages.yaml should include the following directives:

		packages:
   blas:
    require: amdblis@5.2
   flame:
    require: amdlibflame@5.2
   lapack:
    require: amdlibflame@5.2
   fftw-api:
    require: amdfftw@5.2
   scalapack:
    require: amdscalapack@5.2

To edit the packages.yaml use the command spack config edit packages, and see the relevant Spack documentation section for further details.

AMD-AOCL

AMD AOCL is a bundle package that provides all the above listed AOCL libraries — including amdblis, amdlibflame, amdfftw, amdscalapack, amdlibm, aocl-sparse, aocl-libmem, aocl-crypto, aocl-compression, aocl-dlp, and aocl-da — for easy installation.

Building AMD-AOCL

		$ spack install amd-aocl %aocc

The following is the list of variants available with AMD-AOCL:

Variant	Allowed Values	Description
openmp	on, off	Enable OpenMP support

AMD BLIS (AOCL-BLAS)

AOCL-BLAS is a high-performant implementation of the Basic Linear Algebra Subprograms (BLAS). The BLAS was designed to provide the essential kernels of matrix and vector computation and are the most used and computationally intensive operations in dense numerical linear algebra. Select kernels have been optimized for the AMD “Zen”-based processors including AMD EPYC^™, AMD Ryzen™, AMD Ryzen™ Threadripper™ processors.

AMD offers the optimized version of BLIS (AOCL-BLAS) that supports C, FORTRAN, and C++ template interfaces for the BLAS functionalities.

Official Website: https://www.amd.com/en/developer/aocl/dense.html

Building AMD BLIS (AOCL-BLAS)

		$ spack install amdblis %aocc

The following is the list of variants available with AMD BLIS:

Variant	Allowed Values	Description
blas	on, off	BLAS Compatibility
cblas	on, off	CBLAS Compatibility
ilp64	on, off	Build with ILP64 Support
library	shared, static	Build shared library, static library, or both
threads	pthreads, openmp, none	Multithreading support
aocl_gemm	on, off	AOCL-GEMM add-on module support
suphandling	on, off	Small Unpacked Kernel handling

AMD LibFLAME (AOCL-LAPACK)

AOCL-LAPACK is a high performant implementation of Linear Algebra PACKage (LAPACK). LAPACK provides routines for solving systems of linear equations, least-squares problems, eigenvalue problems, singular value problems, and the associated matrix factorizations.

Starting with version 5.2, AMD libFLAME (AOCL-LAPACK) exclusively uses the CMake build system; Autoconf support has been discontinued.

Official Website: https://www.amd.com/en/developer/aocl/dense.html

Building AMD libFLAME (AOCL-LAPACK)

		$ spack install amdlibflame %aocc

The following is the list of variants available with AMD libFLAME:

Variant	Allowed Values	Description
ilp64	on, off	Build with ILP64 support
lapack2flame	on, off	Map legacy LAPACK routine invocations to their corresponding native C implementations in libflame
shared	on, off	Build shared library
static	on, off	Build static library
threads	pthreads, openmp, none	Multithreading support
vectorization	none, auto, avx2, avx512	Use hardware vectorization support

AMD FFTW

FFTW is a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases thereof, copyrighted by MIT and distributed under the GNU General Public License. An AMD-optimized FFTW (derived from community FFTW – fftw.org) that includes selective kernels and routines optimized for the AMD EPYC™, Ryzen™, and Ryzen™ Threadripper™ processor families is available.

Official Website: https://www.amd.com/en/developer/aocl/fftw.html

Building AMD FFTW

		$ spack install amdfftw %aocc

The following is the list of variants available with AMD FFTW:

Variant	Allowed Values	Description
amd-top-n-planner	on, off	Build with amd-top-n-planner support
amd-mpi-vader-limit	on, off	Build with amd-mpi-vader-limit support
amd-trans	on, off	Build with amd-trans support
amd-app-opt	on, off	Build with amd-opt support
amd-fast-planner	on, off	Option to reduce the planning time without much tradeoff in the performance. It is supported for float and double precision
amd-dynamic-dispatcher	on, off	Single portable optimized library to execute on different x86 CPU architectures
mpi	on, off	Activate MPI support
openmp	on, off	Enable OpenMP support
precision	long_double, quad, float, double	Build the selected floating-point precision libraries
shared	on, off	Build shared library
static	on, off	Build static library
threads	on, off	Enable SMP threads support

AMD ScaLAPACK

AOCL-ScaLAPACK is a library of high-performance linear algebra routines for parallel distributed memory machines. It can be used to solve linear systems, least squares problems, eigenvalue problems, and singular value problems. AOCL-ScaLAPACK is optimized for AMD “Zen”-based processors. It depends on the external libraries BLAS and LAPACK; thus, the use of AOCL-BLIS and AOCL-libFLAME is recommended.

Official Website: https://www.amd.com/en/developer/aocl/scalapack.html

Building AMD ScaLAPACK

		$ spack install amdscalapack %aocc

The following is the list of variants available with AMD ScaLAPACK:

Variant	Allowed Values	Description
ilp64	on, off	Build with ILP64 support

AMD LibM (Math Library)

AOCL-LibM is a high-performance implementation of LibM, the standard C library for fundamental floating-point mathematical functions. It includes a broad set of functions from the C99 standard, offering both single and double precision variants optimized for accuracy and speed, along with select complex functions.

Additionally, AOCL-LibM provides several vectorized and fast scalar versions that trade a small degree of accuracy for significantly improved performance.

The build system has been upgraded to CMake, providing comprehensive cross-platform support from version 5.2 onward.

Official Website: https://www.amd.com/en/developer/aocl/libm.html

Building AMD LibM

		$ spack install amdlibm %aocc

AOCL-Sparse

AOCL-Sparse contains basic linear algebra subroutines for sparse matrices and vectors optimized for AMD EPYC™, Ryzen™, and Ryzen™ Threadripper™ processor families. It is designed to be used with C and C++. AOCL-Sparse includes sparse solver functions that perform matrix factorization and solution phases.

Official Website: https://www.amd.com/en/developer/aocl/sparse.html

Building AOCL-Sparse

		$ spack install aocl-sparse %aocc

The following is the list of variants available with AOCL-Sparse:

Variant	Allowed Values	Description
avx	on, off	Enable experimental AVX512
ilp64	on, off	Build with ILP64 support
benchmarks	on, off	Build benchmarks
examples	on, off	Build sparse examples
shared	on, off	Build shared library
unit_tests	on, off	Build sparse unit tests
openmp	on, off	Enable OpenMP support

AOCL-Utils

AOCL-Utils provides a uniform interface to all the AOCL libraries to access the CPU features for AMD CPUs. This library provides the following features:

Core details
Flags available/usable
ISA available/usable
Topology about L1/L2/L3 caches

AOCL-Utils is designed for integration with the other AOCL libraries. Each project has its own mechanism to identify the CPU and provide necessary features such as Dynamic Dispatch. The main purpose of this library is to provide a centralized mechanism to update/validate and provide information to the users.

Official Website: https://www.amd.com/en/developer/aocl/utils.html

Building AOCL-Utils

		  $ spack install aocl-utils %aocc

The following is the list of variants available with AOCL-Utils:

Variant	Allowed Values	Description
doc	on, off	Enable documentation

AOCL-LibMem

AOCL-LibMem is a Linux library for data movement and manipulation functions (such as memcpy and strcpy) highly optimized for AMD Zen micro-architecture.

This library has multiple implementations of each function that can be chosen based on the application requirements as per alignments, instruction choice, threshold values, and tunable parameters.

By default, this library will choose the best-fit implementation based on the underlying micro-architectural support for CPU features and instructions.

This release of the AOCL-LibMem library supports the “standard C library memory handling functions.”

Official Website:  https://www.amd.com/en/developer/aocl/libmem.html

Building AOCL-LibMem

		$ spack install aocl-libmem %aocc

The following is the list of variants available with AOCL-LibMem:

Variant	Allowed Values	Description
vectorization	avx2, avx512, auto, none	Use hardware vectorization support
shared	on, off	Build shared library
tunables	on, off	Enable/Disable user input
logging	on, off	Enable/Disable logger
dynamic-dispatch	on, off	Single portable optimized library to execute on different x86 CPU architectures

AOCL-Crypto

AOCL-Crypto is a library consisting of basic cryptographic functions optimized and tuned for AMD Zen™ based microarchitecture.

This library provides a unified solution for Cryptographic routines such as AES (Advanced Encryption Standard) encryption/decryption routines (CBC, CFB, OFB, CTR, GCM, XTS, CCM, SIV), SHA (Secure Hash Algorithms) routines (SHA2, SHA3, SHAKE), Message Authentication Code (CMAC, HMAC), ECDH (Elliptic-curve Diffie–Hellman) and RSA (Rivest, Shamir, and Adleman) key generation functions, etc. AOCL Crypto supports a dynamic dispatcher feature that executes the most optimal function variant implemented using Function multi-versioning thereby offering a single optimized library portable across different x86 CPU architectures.

Official Website: https://www.amd.com/en/developer/aocl/cryptography.html

Building AOCL-Crypto

		$ spack install aocl-crypto %aocc

The following is the list of variants available with AOCL-Crypto:

Variant	Allowed Values	Description
examples	on, off	Build examples
ipp	on, off	Build Intel IPP library

AOCL-Compression

AOCL-Compression is a software framework of various lossless compression and decompression methods tuned and optimized for AMD Zen based CPUs.

This framework offers a single set of unified APIs for all the supported compression and decompression which facilitates applications in easily integrating and using them.

AOCL-Compression supports lz4, zlib/deflate, lzma, zstd, bzip2, snappy, and lz4hc based compression and decompression methods along with their native APIs.

The library offers openMP based multi-threaded implementation of lz4, zlib, zstd and snappy compression methods. It supports the dynamic dispatcher feature that executes the most optimal function variant implemented using function multi-versioning thereby offering a single optimized library portable across different x86 CPU architectures.

AOCL-Compression framework is developed in C for UNIX® and Windows® based systems. A test suite is provided for the validation and performance benchmarking of the supported compression and decompression methods.

This suite also supports the benchmarking of IPP compression methods, such as, lz4, lz4hc, zlib and bzip2. The library build framework offers Ctest-based testing of the test cases implemented using GTest and the library test suite.

Official Website:  https://www.amd.com/en/developer/aocl/compression.html

Building AOCL-Compression

		$ spack install aocl-compression %aocc

The following is the list of variants available with AOCL-Compression:

Variant	Allowed Values	Description
shared	on, off	Build shared library
openmp	on, off	Build with openmp-based multi-threaded compression and decompression
zlib/bzip2/snappy/zstd/lzma/lz4/lz4hc	on, off	By default, these libraries are built. Use off to disable any library
decompress_fast	"OFF", "1", "2"	Enable fast decompression modes
enable_fast_math	on, off	Enable fast-math optimizations

AOCL-DA

The AOCL Data Analytics Library (AOCL-DA) is a data analytics library providing optimized building blocks for data analysis. It is written with a C-compatible interface to make it as seamless as possible to integrate with the library from whichever programming language you are using. The intended workflow for using the library is as follows:

load data from memory by reading CSV files or using the in-built da_datastore object
preprocess the data by removing missing values, standardizing, and selecting certain subsets of the data, before extracting contiguous arrays of data from the da_datastore objects
data processing (e.g. principal component analysis, linear model fitting, etc.)

C++ example programs can be found in the examples folder of your installation.

Official Website:  https://www.amd.com/en/developer/aocl/data-analytics.html

Building AOCL-DA

		$ spack install aocl-da %aocc

The following is the list of variants available with AOCL-DA:

Variant	Allowed Values	Description
openmp	on, off	Build using OpenMP and link to threaded BLAS and LAPACK
python	on, off	Build with python bindings
ilp64	on, off	Build with ILP64 support
shared	on, off	Build shared library
examples	on, off	Build examples
gtest	on, off	Build and install Googletest

AOCL-DLP

AOCL-DLP (Deep Learning Primitives) is a high-performance library that provides optimized deep learning primitives for AMD processors. The library implements Low Precision GEMM (LPGEMM) operations for deep learning applications with support for multiple data types, post-operations, and quantization techniques. Select kernels have been optimized for AMD EPYC™ processors, leveraging AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

AOCL-DLP provides APIs for GEMM operations with various precision formats, comprehensive post-operations for fused computations, batch GEMM support, symmetric quantization routines, and parallel execution via OpenMP.

Official Website:  https://www.amd.com/en/developer/aocl/dlp.html

Building AOCL-DLP

		$ spack install aocl-dlp %aocc

The following is the list of variants available with AOCL-DLP:

Variant	Allowed Values	Description
benchmarks	on, off	Build benchmarks
tests	on, off	Enable tests
ctest	on, off	Set DLP_CTEST_DISABLED=ON to skip CTest invocation
examples	on, off	Build examples
shared	on, off	Build shared library
threads	none, openmp, pthread	Select threading backend for AOCL‑DLP

Centro de datos

Sistemas Comerciales

Dispositivos personales y para gaming

Productos Integrados

Recursos

Aceleradores de GPU

Aceleradores Adaptables

Aceleradores de DPU

Adaptadores de ethernet

Workstations

Equipos de Escritorio

Computadoras Portátiles

Recursos

FPGA y SoC Adaptables

Sistemas en Módulos (SOM)

Tecnologías

Recursos para el Desarrollador

Placas y Kits de Prueba

Herramientas para Procesadores

Herramientas y Aplicaciones para Tarjetas Gráficas

Herramientas de FPGA y SoC Adaptables

Propiedad Intelectual y Aplicaciones

Herramientas y Apps para Aceleradores de GPU

Herramientas de Adaptador Ethernet

Resumen

Para centros de datos y la nube

Para el borde y los puntos de conexión

Para desarrolladores

Industrias

Industrias

Industrias

Industrias

Industrias

Cargas de Trabajo

Juegos

Sistemas

Tecnologías

Recursos

Procesadores EPYC

Tarjetas gráficas Radeon y chipsets AMD

FPGA y SoC Adaptables

Aceleradores Alveo y SOM Kria

Procesadores Ryzen

Adaptadores de Ethernet

Resumen

Procesadores

Aceleradores

Productos Embedded

Tarjetas Gráficas

Página de inicio del Centro para socios

Recursos por producto

Recursos por tipo

Acerca de nuestros socios

Soporte global de AMD

Procesadores y Tarjetas Gráficas

Aceleradores

FPGA y SoC Adaptables

Experiencia de juego y computación personal

Informática embebida y adaptable

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Introduction

Preferring AMD AOCL Packages

AMD-AOCL

Building AMD-AOCL

AMD BLIS (AOCL-BLAS)

Building AMD BLIS (AOCL-BLAS)

AMD LibFLAME (AOCL-LAPACK)

Building AMD libFLAME (AOCL-LAPACK)

AMD FFTW

Building AMD FFTW

AMD ScaLAPACK

Building AMD ScaLAPACK

AMD LibM (Math Library)

Building AMD LibM

AOCL-Sparse