Introduction

AOCL is a set of numerical libraries optimized for AMD processors based on the AMD “Zen” core architecture and generations. Supported processor families are AMD EPYC™, AMD Ryzen™, and AMD Ryzen™ Threadripper™ processors. The tuned implementations of industry-standard math libraries enable rapid development of scientific and high-performance computing applications.

Official Website: https://www.amd.com/en/developer/aocl.html

The following AOCL libraries are supported with Spack:

  • amdblis
  • amdlibflame
  • amdfftw
  • amdscalapack
  • amdlibm
  • aocl-sparse
  • aocl-utils
  • aocl-crypto
  • aocl-libmem
  • aocl-compression
  • aocl-da
  • aocl-dlp

Note: Users can install the above libraries individually, or as a bundle using the amd-aocl package.

Spack is designed to automatically resolve library dependencies when installing HPC applications, therefore it is not necessary to explicitly install AMD libraries ahead of time.  Instead, to ensure your application is built with all supported AMD Optimized libraries, Spack should be configured to always prefer AMD AOCL libraries.

Preferring AMD AOCL Packages

To configure Spack to select AMD AOCL packages by default for linear algebra and other functions for the specified library version, you need to edit the packages.yaml file.

For example, if you are using the latest version, 5.2, the contents of packages.yaml should include the following directives:

		packages:
   blas:
    require: amdblis@5.2
   flame:
    require: amdlibflame@5.2
   lapack:
    require: amdlibflame@5.2
   fftw-api:
    require: amdfftw@5.2
   scalapack:
    require: amdscalapack@5.2
	

To edit the packages.yaml use the command spack config edit packages, and see the relevant Spack documentation section for further details.

 

AMD-AOCL

AMD AOCL is a bundle package that provides all the above listed AOCL libraries — including amdblis, amdlibflame, amdfftw, amdscalapack, amdlibm, aocl-sparse, aocl-libmem, aocl-crypto, aocl-compression, aocl-dlp, and aocl-da — for easy installation.

Building AMD-AOCL

		$ spack install amd-aocl %aocc
	

The following is the list of variants available with AMD-AOCL:

Variant Allowed Values Description
openmp on, off Enable OpenMP support

AMD BLIS (AOCL-BLAS)

AOCL-BLAS is a high-performant implementation of the Basic Linear Algebra Subprograms (BLAS). The BLAS was designed to provide the essential kernels of matrix and vector computation and are the most used and computationally intensive operations in dense numerical linear algebra. Select kernels have been optimized for the AMD “Zen”-based processors including AMD EPYC, AMD Ryzen™, AMD Ryzen™ Threadripper™ processors.

AMD offers the optimized version of BLIS (AOCL-BLAS) that supports C, FORTRAN, and C++ template interfaces for the BLAS functionalities.

Official Website: https://www.amd.com/en/developer/aocl/dense.html

 

Building AMD BLIS (AOCL-BLAS)

		$ spack install amdblis %aocc
	

The following is the list of variants available with AMD BLIS:

Variant Allowed Values Description
blas on, off BLAS Compatibility
cblas on, off CBLAS Compatibility
ilp64 on, off Build with ILP64 Support 
library shared, static Build shared library, static library, or both
threads pthreads, openmp, none Multithreading support
aocl_gemm on, off AOCL-GEMM add-on module support
suphandling on, off Small Unpacked Kernel handling

AMD LibFLAME (AOCL-LAPACK)

AOCL-LAPACK is a high performant implementation of Linear Algebra PACKage (LAPACK). LAPACK provides routines for solving systems of linear equations, least-squares problems, eigenvalue problems, singular value problems, and the associated matrix factorizations.

Starting with version 5.2, AMD libFLAME (AOCL-LAPACK) exclusively uses the CMake build system; Autoconf support has been discontinued.

Official Website: https://www.amd.com/en/developer/aocl/dense.html

 

Building AMD libFLAME (AOCL-LAPACK)

		$ spack install amdlibflame %aocc
	

The following is the list of variants available with AMD libFLAME:

Variant Allowed Values Description
ilp64 on, off Build with ILP64 support
lapack2flame on, off Map legacy LAPACK routine invocations to their corresponding native C implementations in libflame
shared on, off Build shared library
static on, off Build static library
threads pthreads, openmp, none Multithreading support
vectorization none, auto, avx2, avx512 Use hardware vectorization support

AMD FFTW

FFTW is a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases thereof, copyrighted by MIT and distributed under the GNU General Public License. An AMD-optimized FFTW (derived from community FFTW – fftw.org) that includes selective kernels and routines optimized for the AMD EPYC™, Ryzen™, and Ryzen™ Threadripper™ processor families is available.

Official Website: https://www.amd.com/en/developer/aocl/fftw.html

 

Building AMD FFTW

		$ spack install amdfftw %aocc
	

The following is the list of variants available with AMD FFTW:

Variant Allowed Values Description
amd-top-n-planner on, off Build with amd-top-n-planner support
amd-mpi-vader-limit on, off Build with amd-mpi-vader-limit support
amd-trans on, off Build with amd-trans support
amd-app-opt on, off Build with amd-opt support
amd-fast-planner on, off Option to reduce the planning time without much tradeoff in the performance. It is supported for float and double precision
amd-dynamic-dispatcher on, off Single portable optimized library to execute on different x86 CPU architectures
mpi on, off Activate MPI support
openmp on, off Enable OpenMP support
precision long_double, quad, float, double Build the selected floating-point precision libraries
shared on, off Build shared library
static on, off Build static library
threads on, off Enable SMP threads support

AMD ScaLAPACK

AOCL-ScaLAPACK is a library of high-performance linear algebra routines for parallel distributed memory machines. It can be used to solve linear systems, least squares problems, eigenvalue problems, and singular value problems. AOCL-ScaLAPACK is optimized for AMD “Zen”-based processors. It depends on the external libraries BLAS and LAPACK; thus, the use of AOCL-BLIS and AOCL-libFLAME is recommended.

Official Website: https://www.amd.com/en/developer/aocl/scalapack.html

 

Building AMD ScaLAPACK

		$ spack install amdscalapack %aocc
	

The following is the list of variants available with AMD ScaLAPACK:

Variant Allowed Values Description
ilp64 on, off Build with ILP64 support

AMD LibM (Math Library)

AOCL-LibM is a high-performance implementation of LibM, the standard C library for fundamental floating-point mathematical functions. It includes a broad set of functions from the C99 standard, offering both single and double precision variants optimized for accuracy and speed, along with select complex functions.

Additionally, AOCL-LibM provides several vectorized and fast scalar versions that trade a small degree of accuracy for significantly improved performance.

The build system has been upgraded to CMake, providing comprehensive cross-platform support from version 5.2 onward.

Official Website: https://www.amd.com/en/developer/aocl/libm.html

 

Building AMD LibM

		$ spack install amdlibm %aocc
	

 

AOCL-Sparse

AOCL-Sparse contains basic linear algebra subroutines for sparse matrices and vectors optimized for AMD EPYC™, Ryzen™, and Ryzen™ Threadripper™ processor families. It is designed to be used with C and C++. AOCL-Sparse includes sparse solver functions that perform matrix factorization and solution phases.

Official Website: https://www.amd.com/en/developer/aocl/sparse.html

 

Building AOCL-Sparse

		$ spack install aocl-sparse %aocc
	

The following is the list of variants available with AOCL-Sparse:

Variant Allowed Values Description
avx on, off Enable experimental AVX512
ilp64 on, off Build with ILP64 support
benchmarks on, off Build benchmarks
examples on, off Build sparse examples
shared on, off Build shared library
unit_tests on, off Build sparse unit tests
openmp on, off Enable OpenMP support

AOCL-Utils

AOCL-Utils provides a uniform interface to all the AOCL libraries to access the CPU features for AMD CPUs. This library provides the following features: 

  • Core details 
  • Flags available/usable 
  • ISA available/usable 
  • Topology about L1/L2/L3 caches 

AOCL-Utils is designed for integration with the other AOCL libraries. Each project has its own mechanism to identify the CPU and provide necessary features such as Dynamic Dispatch. The main purpose of this library is to provide a centralized mechanism to update/validate and provide information to the users. 

Official Website:https://www.amd.com/en/developer/aocl/utils.html

 

Building AOCL-Utils 

		  $ spack install aocl-utils %aocc
	

The following is the list of variants available with AOCL-Utils:

Variant Allowed Values Description
doc  on, off  Enable documentation


AOCL-LibMem

AOCL-LibMem is a Linux library for data movement and manipulation functions (such as memcpy and strcpy) highly optimized for AMD Zen micro-architecture.

This library has multiple implementations of each function that can be chosen based on the application requirements as per alignments, instruction choice, threshold values, and tunable parameters.

By default, this library will choose the best-fit implementation based on the underlying micro-architectural support for CPU features and instructions.

This release of the AOCL-LibMem library supports the “standard C library memory handling functions.”

Official Website:  https://www.amd.com/en/developer/aocl/libmem.html

 

Building AOCL-LibMem

		$ spack install aocl-libmem %aocc
	

The following is the list of variants available with AOCL-LibMem:

Variant Allowed Values Description
vectorization avx2, avx512, auto, none Use hardware vectorization support
shared on, off  Build shared library
tunables on, off  Enable/Disable user input
logging on, off  Enable/Disable logger
dynamic-dispatch on, off Single portable optimized library to execute on different x86 CPU architectures


AOCL-Crypto

AOCL-Crypto is a library consisting of basic cryptographic functions optimized and tuned for AMD Zen™ based microarchitecture.

This library provides a unified solution for Cryptographic routines such as AES (Advanced Encryption Standard) encryption/decryption routines (CBC, CFB, OFB, CTR, GCM, XTS, CCM, SIV), SHA (Secure Hash Algorithms) routines (SHA2, SHA3, SHAKE), Message Authentication Code (CMAC, HMAC), ECDH (Elliptic-curve Diffie–Hellman) and RSA (Rivest, Shamir, and Adleman) key generation functions, etc. AOCL Crypto supports a dynamic dispatcher feature that executes the most optimal function variant implemented using Function multi-versioning thereby offering a single optimized library portable across different x86 CPU architectures.

Official Website: https://www.amd.com/en/developer/aocl/cryptography.html

 

Building AOCL-Crypto

		$ spack install aocl-crypto %aocc
	

The following is the list of variants available with AOCL-Crypto:

Variant Allowed Values Description
examples on, off Build examples
ipp on, off Build Intel IPP library


AOCL-Compression

AOCL-Compression is a software framework of various lossless compression and decompression methods tuned and optimized for AMD Zen based CPUs.

This framework offers a single set of unified APIs for all the supported compression and decompression which facilitates applications in easily integrating and using them.

AOCL-Compression supports lz4, zlib/deflate, lzma, zstd, bzip2, snappy, and lz4hc based compression and decompression methods along with their native APIs.

The library offers openMP based multi-threaded implementation of lz4, zlib, zstd and snappy compression methods. It supports the dynamic dispatcher feature that executes the most optimal function variant implemented using function multi-versioning thereby offering a single optimized library portable across different x86 CPU architectures.

AOCL-Compression framework is developed in C for UNIX® and Windows® based systems. A test suite is provided for the validation and performance benchmarking of the supported compression and decompression methods.

This suite also supports the benchmarking of IPP compression methods, such as, lz4, lz4hc, zlib and bzip2. The library build framework offers Ctest-based testing of the test cases implemented using GTest and the library test suite.

Official Website:  https://www.amd.com/en/developer/aocl/compression.html

 

Building AOCL-Compression

		$ spack install aocl-compression %aocc
	

The following is the list of variants available with AOCL-Compression:

Variant Allowed Values Description
shared on, off Build shared library
openmp on, off Build with openmp-based multi-threaded compression and decompression
zlib/bzip2/snappy/zstd/lzma/lz4/lz4hc on, off By default, these libraries are built. Use off to disable any library
decompress_fast "OFF", "1", "2" Enable fast decompression modes
enable_fast_math on, off Enable fast-math optimizations

 

AOCL-DA

The AOCL Data Analytics Library (AOCL-DA) is a data analytics library providing optimized building blocks for data analysis. It is written with a C-compatible interface to make it as seamless as possible to integrate with the library from whichever programming language you are using. The intended workflow for using the library is as follows:

  • load data from memory by reading CSV files or using the in-built da_datastore object
  • preprocess the data by removing missing values, standardizing, and selecting certain subsets of the data, before extracting contiguous arrays of data from the da_datastore objects
  • data processing (e.g. principal component analysis, linear model fitting, etc.)

C++ example programs can be found in the examples folder of your installation.

Official Website:  https://www.amd.com/en/developer/aocl/data-analytics.html

 

Building AOCL-DA

		$ spack install aocl-da %aocc
	

The following is the list of variants available with AOCL-DA:

Variant Allowed Values Description
openmp on, off Build using OpenMP and link to threaded BLAS and LAPACK
python on, off Build with python bindings
ilp64 on, off Build with ILP64 support
shared on, off Build shared library
examples on, off Build examples
gtest on, off Build and install Googletest

AOCL-DLP

AOCL-DLP (Deep Learning Primitives) is a high-performance library that provides optimized deep learning primitives for AMD processors. The library implements Low Precision GEMM (LPGEMM) operations for deep learning applications with support for multiple data types, post-operations, and quantization techniques. Select kernels have been optimized for AMD EPYC™ processors, leveraging AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

AOCL-DLP provides APIs for GEMM operations with various precision formats, comprehensive post-operations for fused computations, batch GEMM support, symmetric quantization routines, and parallel execution via OpenMP.

Official Website:  https://www.amd.com/en/developer/aocl/dlp.html

 

Building AOCL-DLP

		$ spack install aocl-dlp %aocc
	

The following is the list of variants available with AOCL-DLP:

Variant Allowed Values Description
benchmarks on, off Build benchmarks
tests on, off Enable tests
ctest on, off Set DLP_CTEST_DISABLED=ON to skip CTest invocation
examples on, off Build examples
shared on, off Build shared library
threads none, openmp, pthread Select threading backend for AOCL‑DLP