AMD Zen Deep Neural Network (ZenDNN)

Overview

ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edge support for modern workloads.

ZenDNN at a Glance

Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment.
Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance.
ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.

ZenDNN Provides:

Efficient multi-threading on large number of CPU cores
Enhanced microkernels for efficient low level math operations
Optimized Mempools
Comprehensive graph optimizations and kernel fusions
Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
Opensource code

이미지 확대

Getting Started

Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance tuning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter. To read more about current and previous releases, check out the ZenDNN Release Blog tab.

ZenDNN User Guide

ZenDNN Support Matrix

Documentation

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had

ZenDNN Blogs

Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.

ZenDNN 6.0 Release Blog

ZenDNN 5.2.1 Release Blog

To read more about current and previous releases, see the AMD Technical Articles and Blogs.

Technical Articles and Blogs

What’s New

6.0 Release Highlights

ZenDNN 6.0.0 is a major release building on the 5.2.1 runtime architecture. It deepens the Low Overhead API (LowOHA) as the primary inference path, expands MoE group GEMM and FP16 operator coverage, and adds production-grade post-op and weight caching, with corresponding extensions to BenchDNN, gtests, and operator documentation.

Highlights at a Glance

MoE / Group MatMul
- Regime-aware AUTO routing (decode vs. prompt)
- F16 group matmul support
- Group Dynamic Quant API
- N-tile DQ-INT8 and M-tile vertical-fusion custom kernels
- Gated-activation enhancements
- New runtime knob: ZENDNNL_GRP_MATMUL_ALGO
FP16 Expansion
- FP16 coverage now includes SDPA, normalization, quant reorder, quant embedding, and softmax, extending the F16 matmul paths already present in 5.2.1 (AOCL DLP and OneDNN)
Caching
- Post-op metadata cache
  (ZENDNNL_ENABLE_POSTOP_CACHE)
- In-place weight caching
  (ZENDNNL_MATMUL_WEIGHT_CACHE=2)
- Weight-reorder and ZP-compensation caches are now enabled by default
Quantization
- Vectorized per-group dynamic quant kernel
- BenchDNN dynamic-quantization workloads
AutoTuner
- The ZENDNNL_MATMUL_AUTO_ALGO_CANDIDATES environment variable customizes the algorithm evaluation sequence
BenchDNN
- SDPA and normalization workloads
- MoE fused inputs
- Group-matmul hardware perf-counter infrastructure
Activations
- MISH activation support
- Float conversion paths in reorder (FP32 <-> FP16/BF16)
Build & Compatibility
- CMake minimum raised to 3.26

ZenDNN Plugin for PyTorch (zentorch)

Overview

This is a major zentorch release that significantly expands framework support, introduces groundbreaking optimizations for Mixture of Experts (MoE) architectures, and enhances quantization capabilities for Large Language Models and Recommender Systems on AMD EPYC™ CPUs. This release transitions to PyTorch-aligned versioning, adds vLLM support up to version 0.23.0, introduces limited FP16 support for select operations, AOTI (Ahead of Time Inductor) integration, and delivers substantial performance improvements through Fused MoE operations and optimized group matmul kernels.

Improvements

1. Framework and Version Support

PyTorch 2.12.0 Support: Full compatibility with PyTorch 2.12.0 in addition to PyTorch 2.11.0
New plugin versioning scheme: Changed from independent ZenDNN versioning (6.0.0) to PyTorch-aligned format (v2.12.0.2, v2.11.0.2)
- Format:
  {PYTORCH_MAJOR}.{PYTORCH_MINOR}.{PYTORCH_PATCH}.{ZENTORCH_PLUGIN_PATCH}
- Enables clearer compatibility and smoother upgrade paths
vLLM 0.20.0–0.23.0 Support: Extended vLLM compatibility across major releases
- Out-of-tree plugin runtime support starts at vLLM 0.20.0
- Dropped support for vLLM 0.15.0–0.19.1
- Zen optimizations and features are upstreamed and available via the in-tree ZenCpuPlatform on vLLM for AMD EPYC CPUs with AVX512 ISA support
TorchAO 0.17.0 Support: Enhanced quantization framework compatibility
- TorchAO 0.17.0 for vLLM 0.20.0–0.23.0 and PyTorch 2.11.0

2. Mixture of Experts (MoE) Optimizations

Group Matmul: Parallel group matrix multiplication operator (zentorch_group_matmul) for batched expert computation with gated activation post-ops (SiLU, GELU, SwigluOAI), MoE weighted-reduce fusion, INT8 quantization support, and ZenDNN LowOHA backend
Fused MoE: End-to-end MoE FFN block optimization (zentorch_fused_moe) integrating gate+up → activation → down → weighted reduce workflow with active-set narrowing, buffer aliasing, persistent scratchpads, and vLLM integration
Quantized MoE: TorchAO INT8 quantized MoE model support with DA8W8 (dynamic INT8 activation, INT8 weights) integration
MoE Model Support: Phi-4 Vision, and GatedDeltaNet (GDN) architecture enablement

3. Quantization Enhancements

LLM-Compressor Integration: Support for externally quantized models
- W8A8 (INT8 weights and activations) and W4A16 (INT4 weights, BF16 activations) quantization schemes
- Compatible with vLLM v0.22.0+ for LLM-Compressor quantized models
- llm-compressor 0.10.0.2 for model quantization
- Pre-quantized models on Hugging Face supported out-of-the-box
MoE Model Support: LLM-Compressor quantized MoE models are not supported in this release of zentorch
Quantization Improvements:
- Improved WOQ fusions with better pattern matching
- Accuracy improvements across quantized LLM models

4. FP16 (Float16) Support — Limited

Limited FP16 Operations: Native float16 inference for select operations
- Linear operations and all linear fusions (linear-unary, linear-binary, linear-unary-binary)
- Matrix multiplication (mm, bmm, addmm, baddbmm) with FP16 tensors
- Embedding and EmbeddingBag operations
- SDPA (Scaled Dot-Product Attention) FP16 support
- Weight prepacking for FP16
Hardware Requirements: AVX512-FP16 ISA support required
Known Limitations: FP16 not supported for quantized operations (WOQ, qlinear), convolution in this release

5. Performance Optimizations

SDPA Integration: ZenDNN Scaled Dot-Product Attention kernels
- Optimized attention computation for LLM inference
- FP16 and BF16 SDPA support
RMS Norm Optimization:
- Integrated ZenDNN RMS Norm operations
- Fused Add+RMS Norm for reduced memory bandwidth
- vLLM RMSNorm.forward patching for CPU acceleration
Hardware Compatibility:
- AVX512 detection with graceful fallback to native PyTorch on unsupported hardware

6. vLLM Plugin Enhancements

Plugin Architecture: Comprehensive restructuring for improved stability
- Cleaner separation between platform and general plugin entry points
- GEMM dispatch optimization and oneDNN bypass for zentorch kernels
- Fused RMS Norm integration in vLLM plugin
- TorchAO 0.17.0 patch support for vLLM 0.20.0–0.23.0

7. AOTI Integration

Shim wrappers and lowerings: cpp_wrapper support for zentorch backend in torch.compile path
QLinear, Linear and WoQ AOTI shim support along with their fused variants
QuantEmbeddingBag AOTI shim for quantized embedding operations
Enhanced performance for DLRMv2 model

Breaking Changes

Dropped PyTorch 2.10.0 Support: PyTorch 2.10.0 is no longer supported; use PyTorch 2.11.0 or 2.12.0
Dropped vLLM 0.15.0–0.19.1 Support: Minimum supported vLLM version is now 0.20.0 for out-of-tree plugin
orchAO Version Requirements:
- vLLM 0.20.0–0.23.0 requires TorchAO 0.17.0 for execution
AMD Quark Deprecation: AMD Quark is no longer required for quantization; TorchAO is now the standard quantization framework for zentorch

Known Issues

GLIBCXX version conflicts may require setting LD_PRELOAD (see README)
Experimental Python versions (3.13T, 3.14, 3.14T) are not supported
FP16 support limited to core operations (linear, mm variants, embedding, SDPA); not available for quantized operations, convolution
AVX512 instruction set required for optimal performance; fallback to native PyTorch on non-AVX512 system

ZenDNN Plugin for TensorFlow (zentf)

Improvements

1. Framework and Version Support

TensorFlow 2.21.0 Support: Full compatibility with TensorFlow 2.21.0 through TensorFlow 2.16.1
New plugin versioning scheme: Changed from independent ZenDNN versioning (6.0.0) to TensorFlow-aligned format (v2.21.0.0)
- Format:
  {TENSORFLOW_MAJOR}.{TENSORFLOW_MINOR}.{TENSORFLOW_PATCH}.{ZENTF_PLUGIN_PATCH}

2. Features Support

Einsum Operator Support
- Supports ij,jk->ik (2D matmul) and abc,bcd->abd (batched matmul for MoE)
- FP32 and BF16 support
ZenGroupEmbedding Fusion
- Fuses GatherV2 + SafeCast + ConcatV2 into a single op
- Multi-table embedding support with strided gather
- Eliminates concat overhead
ZenSafeEmbeddingLookupSparse Fusion
- Fuses SparseFillEmptyRows → SparseSegment → SelectV2 subgraph

3. Build Info Enhancements

Added zentf.__version__ and zentf.__config__ attributes
Runtime version / config introspection

4. TF-Java Support

TF-Java interface ported to TF 2.21.0

Binaries Download Links:

ZenDNN Plug-in for PyTorch (Built with PyTorch 2.11.0)	Description	MD5SUM
ZENTORCH_v2.11.0.2_Python_v3.10.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10	750cda740b8070bfa267da18411be038
ZENTORCH_v2.11.0.2_Python_v3.11.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11	a0c7bbe3fa24d50f8a89ff68735c7954
ZENTORCH_v2.11.0.2_Python_v3.12.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12	57767d3f26599b3dd1da5c7134025070
ZENTORCH_v2.11.0.2_Python_v3.13.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13	aa382ee0ef1709113d2324a8c7bfde82
Note: Above packages can be used for LLM executions with vLLM and non LLM executions
ZenDNN Plug-in for PyTorch (Built with PyTorch 2.12.0)	Description
ZENTORCH_v2.12.0.2_Python_v3.10.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10	4b6d35592bd5b5419a55928cde966869
ZENTORCH_v2.12.0.2_Python_v3.11.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11	9e93d8f624b3cb802c9b21d7fcfc36bb
ZENTORCH_v2.12.0.2_Python_v3.12.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12	1927836762eeab1292eb26cd50c267fb
ZENTORCH_v2.12.0.2_Python_v3.13.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13	ac4eedaa1cf7b399b595274e23bd54bf
Note: Above packages can be used for non LLM executions
ZenDNN Plug-in for TensorFlow (Built with TensorFlow 2.20.0)	Description	MD5SUM
ZENTF_v2.20.0.0_Python_v3.10.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10	c9ebcb6f9118cc63b2ab07f432b0ce66
ZENTF_v2.20.0.0_Python_v3.11.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11	5a1df463d835d524d2a39b6944f4c051
ZENTF_v2.20.0.0_Python_v3.12.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12	8b311adc4bbf861a383f71c903e8309e
ZENTF_v2.20.0.0_Python_v3.13.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.13	4f24c62880c8d26f1c1750ac6f015915
ZENTF_v2.20.0.0_Python_v3.9.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.9	dd41a569a60574b0a13d1cba07cea2da
ZENTF_v2.20.0.0_C++_API.zip	This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs	971b56eca3f3e4f0d27c0b3e375a5419

ZenDNN Plug-in for TensorFlow (Built with TensorFlow 2.21.0)	Description	MD5SUM
ZENTF_v2.21.0.0_Python_v3.10.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10	1871470ff213c4752ce8f7cf96ab2e5b
ZENTF_v2.21.0.0_Python_v3.11.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11	6baa1424427917a73eb42d957ad456e4
ZENTF_v2.21.0.0_Python_v3.12.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12	b61edbc61846ecf58227a09d071d4115
ZENTF_v2.21.0.0_Python_v3.13.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.13	5028124b8b4cfa8b7d330b5f4cb9aafa
ZENTF_v2.21.0.0_C++_API.zip	This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs	6cc3bb693491e7b090d243afec320a08

5.2.1 Release Highlights

ZenDNN 5.2.1 is an incremental update built on the ZenDNN 5.2 runtime architecture, focusing on expanded LOWOHA (Low Overhead APIs) and advanced quantization capabilities, along with performance improvements for matmul and GEMV workloads across multiple backends.

This release strengthens production readiness through reduced reorder overhead, enhanced profiling and regression benchmarking, and richer BenchDNN test coverage.

Key enhancements include expanded WOQ/U4 quantization with new DLP (Deep Learning Primitives) APIs, deeper integration of dynamic and static quantization into matmul and reorder flows, optimized LOWOHA normalization with fused add + RMS norm and AVX 512 kernels, ISA dependent FP16 matmul enablement via AOCL DLP and oneDNN, LIBXSMM BF16 BRGEMM improvements, and AutoTuner enhancements for improved kernel selection.

The underlying ZenDNN 5.2 platform remains unchanged, retaining its modular multi backend architecture, AutoTuner driven dispatch, unified caching, improved threading for key primitives, and low overhead APIs for small GEMM and fused BF16/FP32/INT8 workloads.

ZenDNN Plugin for PyTorch (zentorch)

Overview

Incremental release built on the zentorch 5.2.0 foundation.
Adds PyTorch 2.11.0 support, extends vLLM compatibility (0.15.0–0.18.0), and strengthens asymmetric WOQ and INT8 dynamic quantization capabilities.

Key Improvements

Framework Support
- Support for PyTorch 2.11.0 (in addition to 2.10.0).
- Extended vLLM compatibility: 0.16.0, 0.17.0, 0.17.1, and 0.18.0.
Performance & Quantization Enhancements
- Optimized vLLM RMSNorm using C++ CPU kernels.
- Asymmetric WOQ support via TorchAO:
  - Int4 weight only quantization (Int4WeightOnlyOpaqueTensorConfig).
  - Bias support for asymmetric quantized ops.
  - Int4OpaqueTensor support using zentorch.
  - Fusion support for quantized linear operators.
- NT8 dynamic quantization:
  - Integrated dynamic qlinear operator in zentorch backend.
  - Enabled execution of dynamically quantized models via vLLM with zentorch.
- Improved WOQ linear operator performance.
Infrastructure, Testing & Tooling
- Expanded Hypothesis based testing for qlinear and BMM operations.
- Accuracy benchmarking using the LM Eval framework.
- New zentorch weekly PyPI package for development builds.
- Recommended jemalloc via LD_PRELOAD for DLRMv2 quantized model execution.
Bug Fixes & Accuracy
- Fixed unit tests for qlinear_mul_add, fused WOQ linear, and BMM hypothesis tests.

Compatibility & Known Issues

Validated only with torchao==0.16.0 for quantized models.
No breaking changes; fully backward compatible with zentorch 5.2.0.
Known issues:
- Possible GLIBCXX conflicts (may require LD_PRELOAD).
- Experimental Python versions (3.13T, 3.14, 3.14T) are not supported.

ZenDNN Plugin for TensorFlow (zentf)

Overview

Minor incremental release on top of zentf 5.2.0..
Continues focus on inference performance for Recommender Systems and Large Language Models on AMD EPYC™ CPUs.

TensorFlow & Python Support

TensorFlow 2.21.0 as the primary supported version with optimal performance.
- Distributed via PyPI (Python wheel) and as a C++ package.
TensorFlow Java main (75402bef) supported via source build only.
Python 3.10–3.13 fully supported (Python 3.9 dropped).

Key Improvements

TF 2.21 Integration
- Built for and validated against TensorFlow v2.21.0.
- Upgraded build system from Bazel 7.4.1 to 7.7.0.
- Aligned Python support with TensorFlow (3.10–3.13).
- TensorFlow Java supported via main branch due to lack of official 2.21 release.
Backward Build Compatibility (TF 2.16–2.21)
- Single unified codebase supports TensorFlow 2.16.0 through 2.21.0.
- ./configure automatically detects TensorFlow version and applies the correct build setup.
- Version specific configs maintained under version_configs/ (TF 2.19–2.21); TF 2.16–2.18 reuse TF 2.19 settings.
- Bazel and third party dependencies (protobuf, abseil, rules_cc) auto adjust per TensorFlow version.
- Standard build workflow remains unchanged and fully transparent to users.

5.2 Release Highlights

ZenDNN Extension for PyTorch (zentorch):

PyTorch Version Support

PyTorch 2.10.0: Primary support with optimal performance (available via PyPI)
Python 3.10 - 3.13: Full compatibility with the supported Python versions of PyTorch

Improvements

1. vLLM Integration

vLLM-ZenTorch Plugin: Zero-code-changes Plug-and-play automatic acceleration for vLLM V1 inference engine
vLLM Version Support: vLLM 0.12.0 to 0.15.1

2. Quantized Inference Support

LLM Quantization (Weight-Only Quantization) (Experimental):INT4 quantized inference functional support
RecSys Quantization (DLRM-v2):

Embedding tables: UINT4 asymmetric per-channel weight-only quantization
Linear layers: W8A8 quantization (INT8 symmetric per-channel for weights, UINT8 asymmetric per-tensor for activations)
PyTorch 2 Export (PT2E) quantization framework with performance optimizations
Custom EmbeddingBagUInt4Quantizer for embedding quantization
X86InductorQuantizer for linear layer quantization

3. Performance Optimizations

Improved bfloat16 Performance: AMD EPYC™ specific enhancements for bfloat16 operations
Enhanced Operations with LOA: Low Overhead API optimizations for improved performance
Optimized Embedding Kernels: Enhanced embedding bag operations with group op support
Graph Optimizations: Advanced pattern identification and replacement, concat operation folding support

4. Infrastructure and Testing

Hypothesis Testing Framework: Expanded test coverage with property-based testing
NumPy 2.x Compatibility: Updated scripts for NumPy 2.x support
TORCH_COMPILE_DEBUG Support: Full compatibility with PyTorch debugging tools
Integrated with New ZenDNN Library: Updated to new ZenDNN library with self-managed dependency building

5. Documentation

Updated README: Comprehensive documentation updates including:
vLLM plugin usage instructions
Weight-only quantization guide
Profiler output interpretation
Updated examples and usage patterns
Example Scripts: Added DLRM-v2 quantization example scripts

ZenDNN Extension for TensorFlow (zentf):

TensorFlow Version Support

TensorFlow 2.20.0: Primary support with optimal performance (available via PyPI and CPP package)
TensorFlow-Java main(75402bef): Java User interface - Fully supported (available via source build only)
Python 3.9 - 3.13: Full compatibility with the supported Python versions of TensorFlow

Improvements

1. TensorFlow 2.20.0 Integration

zentf 5.2.0 is built for and validated against TensorFlow v2.20.0.
Bazel 7.4.1: Upgraded from Bazel 5.3-6.5 range to a single supported version (7.4.1).
Python 3.9 - 3.13: Extended Python version support to include Python 3.13.
As TF JAVA is not released with 2.20.0 version, zentf is supported with main(75402bef) branch from TensorFlow-Java through source build only.

2. Migrate from legacy ZenDNN library to ZenDNNL

CMake-based ZenDNNL integration using rules_foreign_cc.
All operator kernels (MatMul, Conv2D, BatchMatMul, Softmax, Pooling) have been rewritten to use the ZenDNNL Low Overhead API (LOA), replacing the legacy ZenDNN primitives.
Old third-party dependencies on zen_dnn and amd_blis (BLIS) have been removed, replaced by ZenDNNL with integrated AOCL-DLP.

3. Removed Legacy Components

Mempool optimization has been completely removed and equivalent performance has been achieved using jemalloc as the memory allocator instead.
INT8 support has been removed.
Removal of non-performant ops: ZenTranspose, ZenReshape, Binary ops.

4. Performance Optimizations

Enhanced Operations with LOA: Low Overhead API optimizations for improved performance

Note: For further details on this release, please consult the User Guide.

5.1 Release Highlights

Framework Compatibility

PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.

Performance Optimizations

Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
- EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency.
- Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model.
New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model.
- MatMul + BiasAdd + Tanh
- MatMul + BiasAdd + Sigmoid
Kernel Optimizations:
- BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
- Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.

Ecosystem Contribution

We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.

5.0.2 Release Highlights

Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.

5.0.1 Release Highlights

Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.

5.0 Release Highlights

Support for 5^th Gen AMD EPYC™ processors, formerly codenamed “Turin”
Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
Support for INT4 Weight-Only-Quantization (WOQ)
Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
And more!

Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.

Release Blog

Abstract illustration of a cross-platform integration concept

ZenDNN 5.0: Supercharge AI on AMD EPYC™ Server CPUs

Read Blog

Get Assistance for Current Projects

If you need technical support on ZenDNN, please file an issue ticket on the respective Github page:

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin
[Up to version 5.0]: ONNX Runtime with ZenDNN integrated: https://github.com/amd/ZenDNN-onnxruntime

Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.

Archive Access: For those requiring versions up to ZenDNN 5.1, our archives provide easy access to previous releases, ensuring you have the tools and resources you need for any project.

Sign Up for ZenDNN News

Keep up-to-date on the latest product releases, news, and tips.

서버 CPU

비즈니스 시스템

개인 및 게이밍

Embedded

리소스

GPU 가속기

적응형 가속기

DPU 가속기

이더넷 어댑터

워크스테이션

데스크탑

랩탑

리소스

FPGA 및 적응형 SoC

시스템 온 모듈(SOM)

기술

개발자 리소스

평가 보드 및 킷

프로세서 툴

그래픽 툴 및 앱

FPGA 및 적응형 SoC 툴

지적 재산 및 앱

GPU 가속기 툴 및 앱

이더넷 어댑터 도구

개관

데이터 센터 및 클라우드용

에지 및 엔드포인트용

개발자용

업계

업계

업계

업계

Industrias

워크로드

게이밍

시스템

기술

리소스

EPYC 프로세서

Radeon 그래픽 및 AMD 칩셋

FPGA 및 적응형 SoC

Alveo 가속기 및 Kria SOM

Ryzen 프로세서

이더넷 어댑터

개관

프로세서

가속기

임베디드 제품

그래픽

개관

제품별 리소스

유형별 리소스

파트너 정보

AMD 글로벌 지원

프로세서 및 그래픽

가속기

FPGA 및 적응형 SoC

게이밍 및 개인 컴퓨팅

적응형 및 임베디드 컴퓨팅

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Overview

ZenDNN at a Glance

Getting Started

Documentation

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had

ZenDNN Blogs

What’s New

Highlights at a Glance

ZenDNN Plugin for PyTorch (zentorch)

ZenDNN Plugin for TensorFlow (zentf)

Binaries Download Links:

ZenDNN Plugin for PyTorch (zentorch)

ZenDNN Plugin for TensorFlow (zentf)

ZenDNN 5.0: Supercharge AI on AMD EPYC™ Server CPUs