AMD ZenDNN

Overview

ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers a seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edge support for modern workloads.

ZenDNN at a Glance

  • Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment
  • Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance
  • ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.

ZenDNN Provides:​

  • Efficient multi-threading on large number of CPU cores
  • Enhanced microkernels for efficient low level math operations
  • Optimized Mempools
  • Comprehensive graph optimizations and kernel fusions
  • Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
  • Opensource code
이미지 확대
ZenDNN Advantage on AMD EPYC™ Processors

Getting Started

Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance tuning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter. To read more about current and previous releases, check out the ZenDNN Release Blog tab.

Documentation

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had
woman watching computer training
ZenDNN Blogs

Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.

To read more about current and previous releases, see the AMD Technical Articles and Blogs.

What’s New 

5.2.1 Release Highlights

ZenDNN 5.2.1 is an incremental update built on the ZenDNN 5.2 runtime architecture, focusing on expanded LOWOHA and advanced quantization capabilities, along with performance improvements for matmul and GEMV workloads across multiple backends. 

This release strengthens production readiness through reduced reorder overhead, enhanced profiling and regression benchmarking, and richer BenchDNN test coverage.

Key enhancements include expanded WOQ/U4 quantization with new DLP APIs, deeper integration of dynamic and static quantization into matmul and reorder flows, optimized LOWOHA normalization with fused add + RMS norm and AVX 512 kernels, ISA dependent FP16 matmul enablement via AOCL DLP and oneDNN, LIBXSMM BF16 BRGEMM improvements, and AutoTuner enhancements for improved kernel selection. 

The underlying ZenDNN 5.2 platform remains unchanged, retaining its modular multi backend architecture, AutoTuner driven dispatch, unified caching, improved threading for key primitives, and low overhead APIs for small GEMM and fused BF16/FP32/INT8 workloads.

ZenDNN Plugin for PyTorch (zentorch)

Overview

  • Incremental release built on the zentorch 5.2.0 foundation.
  • Adds PyTorch 2.11.0 support, extends vLLM compatibility (0.15.0–0.18.0), and strengthens asymmetric WOQ and INT8 dynamic quantization capabilities.

Key Improvements

  • Framework Support
    • Support for PyTorch 2.11.0 (in addition to 2.10.0).
    • Extended vLLM compatibility: 0.16.0, 0.17.0, 0.17.1, and 0.18.0.
  • Performance & Quantization Enhancements
    • Optimized vLLM RMSNorm using C++ CPU kernels.
    • Asymmetric WOQ support via TorchAO: 
      • Int4 weight only quantization (Int4WeightOnlyOpaqueTensorConfig).
      • Bias support for asymmetric quantized ops.
      • Int4OpaqueTensor support using zentorch.
      • Fusion support for quantized linear operators.
    • NT8 dynamic quantization
      • Integrated dynamic qlinear operator in zentorch backend.
      • Enabled execution of dynamically quantized models via vLLM with zentorch.
    • Improved WOQ linear operator performance.
  • Infrastructure, Testing & Tooling
    • Expanded Hypothesis based testing for qlinear and BMM operations.
    • Accuracy benchmarking using the LM Eval framework.
    • New zentorch weekly PyPI package for development builds.
    • Recommended jemalloc via LD_PRELOAD for DLRMv2 quantized model execution.
  • Bug Fixes & Accuracy
    • Fixed unit tests for qlinear_mul_add, fused WOQ linear, and BMM hypothesis tests.

Compatibility & Known Issues

  • Validated only with torchao==0.16.0 for quantized models.
  • No breaking changes; fully backward compatible with zentorch 5.2.0.
  • Known issues: 
    • Possible GLIBCXX conflicts (may require LD_PRELOAD).
    • Experimental Python versions (3.13T, 3.14, 3.14T) are not supported.
ZenDNN Plugin for TensorFlow (zentf)

Overview

  • Minor incremental release on top of zentf 5.2.0..
  • Continues focus on inference performance for Recommender Systems and Large Language Models on AMD EPYC™ CPUs.

TensorFlow & Python Support

  • TensorFlow 2.21.0 as the primary supported version with optimal performance. 
    • Distributed via PyPI (Python wheel) and as a C++ package.
  • TensorFlow Java main (75402bef) supported via source build only.
  • Python 3.10–3.13 fully supported (Python 3.9 dropped).

Key Improvements

  • TF 2.21 Integration
    • Built for and validated against TensorFlow v2.21.0.
    • Upgraded build system from Bazel 7.4.1 to 7.7.0.
    • Aligned Python support with TensorFlow (3.10–3.13).
    • TensorFlow Java supported via main branch due to lack of official 2.21 release.
  • Backward Build Compatibility (TF 2.16–2.21)
    • Single unified codebase supports TensorFlow 2.16.0 through 2.21.0.
    • ./configure automatically detects TensorFlow version and applies the correct build setup.
    • Version specific configs maintained under version_configs/ (TF 2.19–2.21); TF 2.16–2.18 reuse TF 2.19 settings.
    • Bazel and third party dependencies (protobuf, abseil, rules_cc) auto adjust per TensorFlow version.
    • Standard build workflow remains unchanged and fully transparent to users.

Binaries Download Links:

ZenDNN Plug-in for PyTorch
(Built with PyTorch 2.11.0)
Description MD5SUM
ZENTORCH_v5.2.1_Python_v3.10.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10 4aa21a5ff400ac6d41bc6cfe5149b31f
ZENTORCH_v5.2.1_Python_v3.11.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11 6aa94d2064b8ef10ff9b7232bd192247
ZENTORCH_v5.2.1_Python_v3.12.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12 f2ae0bacf0c24dcd19df778184943df6
ZENTORCH_v5.2.1_Python_v3.13.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13 b4a4926273950b731369d04a87a4d3ec
Note: Above packages can be used for non LLM executions
ZenDNN Plug-in for PyTorch
(Built with PyTorch 2.10.0)
Description MD5SUM
ZENTORCH_v5.2.1_Python_v3.10.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10 92fa8592cbaaa5314863ac2baa55fb12
ZENTORCH_v5.2.1_Python_v3.11.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11 f5073c974aff541e93548421ae6da0a0
ZENTORCH_v5.2.1_Python_v3.12.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12 d9d325c2fe558211fd0ff9794e579189
ZENTORCH_v5.2.1_Python_v3.13.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13 0a673ff2d9ed063da85b68589a0dfd9b
Note: Above packages can be used for LLM executions with vLLM and non LLM executions
ZenDNN Plug-in for TensorFlow
(Built with TensorFlow 2.21.0)
Description MD5SUM
ZENTF_v5.2.1_Python_v3.10.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10 14085bffc7bff47f1d53090527847896
ZENTF_v5.2.1_Python_v3.11.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11 1d0a7d4370fbc4249aa69620b46c6458
ZENTF_v5.2.1_Python_v3.12.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12 4a1bef3cc78e5dac7312d162a543670d
ZENTF_v5.2.1_Python_v3.13.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.13 cc1a34607e46e138d2c99beb78989354
ZENTF_v5.2.1_C++_API.zip This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs
Note: This C++ package cannot be used for TF Java executions
6a06734604a3b29490a27c547faf0413
ZenDNN Plug-in for TensorFlow
(Built with TensorFlow 2.20.0)
Description MD5SUM
ZENTF_v5.2.1_C++_API.zip This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs
Note: This C++ package should be used for TensorFlow Java execution
69d24544b3a577ab37213a9bd4e4a773

5.2 Release Highlights

ZenDNN Extension for PyTorch (zentorch):

PyTorch Version Support

  • PyTorch 2.10.0: Primary support with optimal performance (available via PyPI)
  • Python 3.10 - 3.13: Full compatibility with the supported Python versions of PyTorch

Improvements

1. vLLM Integration

  • vLLM-ZenTorch Plugin: Zero-code-changes Plug-and-play automatic acceleration for vLLM V1 inference engine
  • vLLM Version Support: vLLM 0.12.0 to 0.15.1

2. Quantized Inference Support

    LLM Quantization (Weight-Only Quantization) (Experimental):INT4 quantized inference functional support
    RecSys Quantization (DLRM-v2):

  • Embedding tables: UINT4 asymmetric per-channel weight-only quantization
  • Linear layers: W8A8 quantization (INT8 symmetric per-channel for weights, UINT8 asymmetric per-tensor for activations)
  • PyTorch 2 Export (PT2E) quantization framework with performance optimizations
  • Custom EmbeddingBagUInt4Quantizer for embedding quantization
  • X86InductorQuantizer for linear layer quantization

3. Performance Optimizations

  • Improved bfloat16 Performance: AMD EPYC™ specific enhancements for bfloat16 operations
  • Enhanced Operations with LOA: Low Overhead API optimizations for improved performance
  • Optimized Embedding Kernels: Enhanced embedding bag operations with group op support
  • Graph Optimizations: Advanced pattern identification and replacement, concat operation folding support

4. Infrastructure and Testing

  • Hypothesis Testing Framework: Expanded test coverage with property-based testing
  • NumPy 2.x Compatibility: Updated scripts for NumPy 2.x support
  • TORCH_COMPILE_DEBUG Support: Full compatibility with PyTorch debugging tools
  • Integrated with New ZenDNN Library: Updated to new ZenDNN library with self-managed dependency building

5. Documentation

  • Updated README: Comprehensive documentation updates including:
  • vLLM plugin usage instructions
  • Weight-only quantization guide
  • Profiler output interpretation
  • Updated examples and usage patterns
  • Example Scripts: Added DLRM-v2 quantization example scripts

ZenDNN Extension for TensorFlow (zentf):

TensorFlow Version Support

  • TensorFlow 2.20.0: Primary support with optimal performance (available via PyPI and CPP package)
  • TensorFlow-Java main(75402bef): Java User interface - Fully supported (available via source build only)
  • Python 3.9 - 3.13: Full compatibility with the supported Python versions of TensorFlow

Improvements

1. TensorFlow 2.20.0 Integration

  • zentf 5.2.0 is built for and validated against TensorFlow v2.20.0.
  • Bazel 7.4.1: Upgraded from Bazel 5.3-6.5 range to a single supported version (7.4.1).
  • Python 3.9 - 3.13: Extended Python version support to include Python 3.13.
  • As TF JAVA is not released with 2.20.0 version, zentf is supported with main(75402bef) branch from TensorFlow-Java through source build only.

2. Migrate from legacy ZenDNN library to ZenDNNL

  •  CMake-based ZenDNNL integration using rules_foreign_cc.
  • All operator kernels (MatMul, Conv2D, BatchMatMul, Softmax, Pooling) have been rewritten to use the ZenDNNL Low Overhead API (LOA), replacing the legacy ZenDNN primitives.
  • Old third-party dependencies on zen_dnn and amd_blis (BLIS) have been removed, replaced by ZenDNNL with integrated AOCL-DLP.

3. Removed Legacy Components

  • Mempool optimization has been completely removed and equivalent performance has been achieved using jemalloc as the memory allocator instead.
  • INT8 support has been removed.
  • Removal of non-performant ops: ZenTranspose, ZenReshape, Binary ops.

4. Performance Optimizations

  • Enhanced Operations with LOA: Low Overhead API optimizations for improved performance

Note: For further details on this release, please consult the User Guide.

5.1 Release Highlights

Framework Compatibility

  • PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
  • vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
  • Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.

Performance Optimizations

  • Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
    • EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency.
    • Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model.
  • New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model.
    • MatMul + BiasAdd + Tanh
    • MatMul + BiasAdd + Sigmoid
  • Kernel Optimizations:
    • BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
    • Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
  • ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.

Ecosystem Contribution

  • We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.

5.0.2 Release Highlights

  • Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
  • Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
  • Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.

5.0.1 Release Highlights

  • Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
  • Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.

5.0 Release Highlights

  • Support for 5th Gen AMD EPYC™ processors, formerly codenamed “Turin”
  • Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
  • New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
  • Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
  • An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
  • Support for INT4 Weight-Only-Quantization (WOQ)
  • Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
  • And more!

Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.  

Release Blog

Get Assistance for Current Projects

If you need technical support on ZenDNN, please file an issue ticket on the respective Github page: 

Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.

Archive Access: For those requiring versions up to ZenDNN 5.1, our archives provide easy access to previous releases, ensuring you have the tools and resources you need for any project.

Sign Up for ZenDNN News

Keep up-to-date on the latest product releases, news, and tips.