Accelerating Applications with AOCL Cryptography

Jun 02, 2026

Optimizing a cryptographic library in isolation is one thing. Making it faster inside a live application, with all the surrounding overhead of query engines, protocol state machines, and storage I/O, is a different challenge. The real measure of a cryptography library is whether it moves the needle on practical workloads commonly utilized by users.

We took AOCL-Cryptography, the AMD Zen™ optimized crypto library, and integrated it into three widely used, performance-critical applications: ClickHouse, strongSwan, and RocksDB. Each one exercises cryptography in a fundamentally different way. ClickHouse runs millions of AES encrypt/decrypt operations inline during SQL query processing. strongSwan negotiates IKE sessions and encrypts IPsec ESP packets at line rate. RocksDB encrypts every SST block written to persistent storage.

This blog covers the integration approach, the performance results, and some of the surprises we encountered along the way.

ClickHouse: Column-Level Encryption at Analytical Scale

ClickHouse, the popular columnar OLAP database, exposes encrypt() and decrypt() as built-in SQL functions backed by OpenSSL. A query like:

SELECT encrypt('aes-256-ctr', payload, key, iv) FROM events LIMIT 1000000

executes a million AES key-schedule and cipher operations inline during query evaluation. At this scale, the crypto backend becomes the dominant cost in the query plan.

We integrated AOCL-Cryptography as a ClickHouse contrib library with an internal interface implementation. A single build flag activates the integration. When enabled, ClickHouse transparently routes supported AES modes (GCM, CBC, CTR, CFB, OFB) through AOCL-optimized codepaths, while unsupported modes fall back to OpenSSL automatically. No changes to SQL syntax or behavior are required. The integration also introduces three new SQL functions for hardware-accelerated hashing: AOCL_SHA256(), AOCL_SHA512(), and AOCL_SHA3_256().

For benchmarking, we ran clickhouse local --time to measure performance on 1 million rows. Each row used SHA256-derived keys and initialization vectors (IVs) for encryption and decryption, ensuring a realistic cryptographic workload. The benchmark involved a full encrypt + decrypt round-trip. To provide reliable results, each configuration was executed three times, and we reported the median timing. OpenSSL 3.5.0 served as our baseline, using the same ClickHouse build (AOCL-Cryptography disabled).

AOCL-Crypto Speedup over OpenSSLi

Test Platforms:

  • AMD: EPYC™ 9755, 128-Core (Zen 5, Turin), Ubuntu 24.04.1, Linux- 6.8.0-94-generic, Clang 19
  • Intel®: Xeon® 6980P, 128 cores/socket x 2, 3.9 GHz, RHEL 9, Clang 19
  • ClickHouse version: v25.11.8.25

Mode

Payload

AMD EPYC™ Turin

Intel® Xeon® 6980P

AES-256-CTR

Small (64B-2KB)

2.77x

1.78x

AES-256-CTR

16KB

1.57x

1.40x

AES-256-CFB

Small (64B-2KB)

1.84x

2.25x

AES-256-CFB

16KB

2.47x

2.15x

AES-256-GCM

Small (64B-2KB)

1.83x

1.54x

AES-256-GCM

16KB

1.26x

1.00x

AES-256-CBC

Small (64B-2KB)

1.33x

1.25x

AES-256-CBC

16KB

0.96x

0.92x

AES-256-OFB

Small (64B-2KB)

1.37x

1.29x

AES-256-OFB

16KB

1.12x

1.12x

Figure 1: ClickHouse Speedup using AOCL
Figure 1: ClickHouse Speedup using AOCL

AOCL-Crypto accelerates every parallelizable AES mode on both platforms, with AMD EPYC Turin amplifying those gains through Zen 5's wider VAES pipelines. CTR leads to 2.77x on Turin vs 1.78x on Intel for small payloads. CFB gains increase with payload size, reaching 2.47x on Turin at 16KB because AOCL's pipelined chaining compounds its advantage where OpenSSL's EVP overhead is more exposed. GCM benefits at small payloads (1.83x Turin, 1.54x Intel) but converges at 16KB as both platforms' AES-NI throughput saturates. CBC at 16KB is flat on both platforms (0.96x Turin, 0.92x Intel) due to the inherent block-chaining dependency that no library can parallelize.

strongSwan: Accelerating IPsec on AMD and Intel

strongSwan is one of the most widely used open-source IPsec implementations, handling VPN tunnels, site-to-site links, and enterprise network policy enforcement. Every IKE negotiation and ESP-encrypted packet passes through the crypto layer, making it a natural target for optimization.

We developed an AOCL-Crypto plugin for strongSwan that provides hardware-accelerated implementations of AES-GCM, AES-CBC, SHA-2, SHA-3, HMAC, SHAKE (XOF), and X25519 key exchange. The plugin registers alongside strongSwan's default OpenSSL plugin. At startup, strongSwan benchmarks all registered plugins using internal test vectors and selects the fastest one for each algorithm. The throughput scores reported below come from this built-in benchmark (higher is better, measured in operations per second on fixed test data).

To demonstrate cross-platform behavior, we ran the same AOCL-enabled strongSwan 6.0.2 build on two very different server platforms.

AOCL-Crypto Gain over OpenSSLii

Test Platforms:

  • AMD EPYC (Turin): EPYC 9755, 128-Core, 4.06 GHz (Zen 5), Ubuntu 6.8.0-94-generic
  • Intel: Xeon 6980P, 128 cores/socket x 2, 3.9 GHz, RHEL 9
  • strongSwan Version: 6.0.2
  • Compiler: GCC-14.2

Algorithm

AMD EPYC Turin

Intel Xeon 6980P

AES-GCM-16 (128-bit)

1.89x

1.48x

AES-GCM-16 (256-bit)

1.72x

1.43x

AES-CBC (128-bit)

1.24x

1.19x

AES-CBC (256-bit)

1.23x

1.16x

SHA2-256

1.23x

1.22x

SHA2-512

0.96x

1.07x

SHA3-256

1.14x

1.22x

Curve25519

1.33x

1.73x

Figure 2: strongSwan Speedup using AOCL
Figure 2: strongSwan Speedup using AOCL

The standout is AES-GCM at 1.89x, which directly reflects Zen 5's wide VAES pipelines and hardware VPCLMULQDQ throughput. HMAC-SHA2-256 gains 1.34x, which matters for IKE authentication where HMAC is computed on every exchange. SHA2-512 through AOCL is marginally slower (0.96x) on this platform, suggesting room for further per-platform tuning.

AOCL-Cryptography is not limited to AMD hardware. On Intel, AES-GCM improves 1.43x for 128-bit key size and 1.48x for 256-bit, SHA-2 and SHA-3 both gain around 1.22x, and Curve25519 jumps 1.73x. The GCM gains are somewhat smaller than on Turin (Intel’s 1.48x vs Turin’s 1.89x) because AOCL's GCM path benefits from Zen-specific instruction scheduling, but the underlying AES-NI and AVX optimizations still outperform OpenSSL's defaults on both platforms.

Note: These are raw crypto throughput scores, not end-to-end tunnel benchmarks. Actual IPsec packet-per-second performance depends on additional factors including packet sizes, number of active security associations, and kernel network stack overhead.

RocksDB: Encryption-at-Rest Without the Performance Penalty

RocksDB, the embedded key-value engine behind workloads at Meta, CockroachDB, TiKV, and many other systems, supports pluggable encryption-at-rest. We built an AOCL-Crypto encryption plugin for RocksDB that provides both AES-CTR and AES-XTS modes through RocksDB's standard encryption provider interface.

AES-CTR works with RocksDB's built-in encryption framework with no modifications to RocksDB itself. AES-XTS, however, requires block-aligned I/O reads that RocksDB does not natively support. To address this, we prepared a patch for RocksDB that adds the necessary block-aligned read support.

To compare crypto providers on equal footing, we created bench-rocksdb, a benchmarking harness built on top of RocksDB's db_bench. It runs the same workload (fillrandom, fillseq, readrandom, readseq with 1 million operations) against AOCL-Crypto, Intel IPP-Crypto (ippcp), and an EncFS AES-256-CTR configuration backed by OpenSSL, plus an unencrypted baseline. All entries go through the same RocksDB encryption interface, so the comparison isolates crypto backend performance.

AOCL AES-CTR Speedup over Competing Providersiii

Test Platforms:

  • AMD: EPYC Turin 9755, 128-Core, 4.06 GHz (Zen 5), Ubuntu 6.8.0-94-generic
  • Intel: Xeon 6980P, 128 cores/socket x 2, 3.9 GHz, RHEL 9

Comparison

AMD EPYC Turin

Intel Xeon 6980P

AOCL vs Unencrypted (fillseq)

1.02x

1.00x

AOCL vs EncFS/OpenSSL (fillseq)

1.68x

1.63x

AOCL vs IPP-Crypto (fillseq)

1.19x

1.21x

AOCL vs EncFS/OpenSSL (fillrandom)

1.41x

1.31x

AOCL vs IPP-Crypto (fillrandom)

1.12x

1.09x

AOCL vs EncFS/OpenSSL (readseq)

1.10x

1.25x

AOCL vs IPP-Crypto (readseq)

0.97x

1.29x

Figure 3: RocksDB Speedup using AOCL
Figure 3: RocksDB Speedup using AOCL

AOCL AES-CTR encryption is effectively overhead-free for writes: sequential throughput stays within 2% of the unencrypted baseline on both platforms. Against OpenSSL-backed EncFS, AOCL-CTR delivers 1.68x higher sequential writes on Turin and 1.63x on Intel. Against Intel IPP-Crypto, AOCL-CTR is 1.19x faster on Turin and 1.21x faster on Intel for sequential writes. On Intel, AOCL's read advantage over IPP-Crypto is even larger at 1.29x for sequential reads and 1.36x for random reads.

*The Intel AOCL readseq number exceeds the unencrypted baseline. This is a `db_bench` artifact: the encryption wrapper alters RocksDB's I/O path in ways that change read-ahead behavior and should not be interpreted as "encryption makes reads faster."

The Payoff

Across three very different application domains, AOCL-Cryptography delivers consistent, measurable improvements:

  • ClickHouse on AMD EPYC Turin: up to 2.77x faster AES-256-CTR, 2.47x faster AES-256-CFB, 1.83x faster AES-256-GCM
  • ClickHouse on Intel Xeon 6980P: up to 2.25x faster AES-256-CFB, 1.78x faster AES-256-CTR, 1.54x faster AES-256-GCM
  • strongSwan on AMD EPYC Turin: up to 1.89x faster AES-GCM, 1.33x faster Curve25519
  • strongSwan on Intel Xeon 6980P: up to 1.48x faster AES-GCM, 1.73x faster Curve25519
  • RocksDB on AMD EPYC Turin: AOCL AES-CTR write throughput within 1% of unencrypted baseline, 19% faster sequential writes than IPP-Crypto
  • RocksDB on Intel Xeon 6980P: AOCL AES-CTR outperforms Intel IPP-Crypto by up to 36%, and matches unencrypted write throughput

These gains come from leveraging hardware crypto acceleration (VAES, VPCLMULQDQ, SHA extensions) and architecture-aware instruction scheduling that generic implementations leave on the table.

Note: The ClickHouse, strongSwan, and RocksDB integrations discussed here are AMD-developed custom patches, not yet upstreamed. This is factual and accurately disclosed. Upstreaming is planned to ensure long-term maintainability and broader community adoption. To benefit from AOCL-Cryptography, users can either implement a direct interface to AOCL APIs from their application’s crypto hooks or use the OpenSSL provider route for transparent integration via the EVP interface.

What's Next

We're broadening ClickHouse integration to include more hash and cipher types and building IPsec tunnel benchmarks for strongSwan. We also aim to upstreaming our custom application-targeted integration patch and interface. By contributing these improvements upstream, we intend to ensure seamless adoption and long-term maintainability for the broader community.

To explore AOCL-Cryptography and the full AOCL suite, visit the AMD AOCL Developer Portal. The AOCL-Cryptography source code is available on GitHub, and we welcome integration feedback from the community.

We'd love to hear about your performance gains — open an issue or start a discussion on our GitHub pages!

Footnotes

Endnotes


i. ClickHouse benchmark:

AMD Internal Testing, 04/15/2026.

  • Systems: 2P AMD EPYC™ 9755 (Zen 5 "Turin", 128C/socket), 1536 GB DDR5-6400, Ubuntu 24.04.1 LTS, kernel 6.8.0-94-generic, SMT=on, Mitigations=off, Power Determinism, BIOS AMI. // 2P Intel® Xeon® 6980P (128C/socket, 3.9 GHz), RHEL 9.
  • Software: ClickHouse v25.11.8.25-stable; baseline OpenSSL 3.5.0; AOCL-Cryptography 5. build via ENABLE_AOCL_CRYPTO=ON; Clang 19.
  • Workload: clickhouse local --time, 1,000,000 rows/query, per-row SHA-256-derived keys and IVs, encrypt + decrypt round-trip; AES-256 modes CTR/CFB/GCM/CBC/OFB at payloads 64B–2KB, 4KB, 16KB; 3 runs, median reported.

ii. strongSwan benchmark:

AMD Internal Testing, 04/18/2026.

  • Systems: 2P AMD EPYC™ 9755 (Zen 5 "Turin", 4.06 GHz), 1536 GB DDR5-6400, Ubuntu 24.04.1 LTS, kernel 6.8.0-94-generic, SMT=on, Mitigations=off, Power Determinism. // 2P Intel® Xeon® 6980P (128C/socket, 3.9 GHz), RHEL 9, kernel 5.14.0.
  • Software: strongSwan 6.0.2 (validated also on 6.0.4); baseline OpenSSL 3.5.0 plugin; AOCL-Cryptography 5.2 loaded as a custom AMD plugin alongside the OpenSSL plugin; GCC 14.2.
  • Workload: strongSwan built-in crypto benchmark on internal test vectors; throughput in "points"; algorithms: AES-GCM-16/12/8 (128/192/256-bit), AES-CBC (128/192/256-bit), SHA2-224/256/384/512, SHA3-224/256/384/512, HMAC-SHA2, PRF_HMAC_SHA2, SHAKE128/256, Curve25519 (X25519). Raw crypto throughput only — not end-to-end IPsec tunnel performance.

iii. RocksDB benchmark:

AMD Internal Testing, 04/20/2026.

  • Systems: 2P AMD EPYC™ 9755 (Zen 5 "Turin", 4.06 GHz), 1536 GB DDR5-6400, Ubuntu 24.04.1 LTS, kernel 6.8.0-94-generic, SMT=on, Mitigations=off, Power Determinism. // 2P Intel® Xeon® 6980P (128C/socket, 3.9 GHz), RHEL 9.
  • Software: RocksDB built from upstream source with custom AMD AOCL-Cryptography encryption provider (AES-XTS additionally requires a custom block-aligned-read RocksDB patch); AOCL-Cryptography 5.2; Intel IPP-Crypto (latest at test date); EncFS over OpenSSL 3.5.0; harness bench-rocksdb (wrapper over db_bench).
  • Workload: db_bench operations fillseq, fillrandom, readseq, readrandom, 1,000,000 ops each. Five configurations via RocksDB's pluggable encryption interface: base (unencrypted), alcp-ctr (AOCL AES-256-CTR), alcp-xts (AOCL AES-256-XTS), ippcp (Intel IPP AES-256-CTR), encfs-aes256ctr (EncFS/OpenSSL AES-256-CTR).

General Disclaimer:

Testing conducted by AMD Performance Labs. Performance results are based on internal AMD testing using the configurations listed above and are provided for informational purposes only. The ClickHouse, strongSwan, and RocksDB integrations referenced are custom interfaces developed by AMD on top of the corresponding upstream application repositories and are not currently upstreamed; equivalent uplift can be obtained by users by implementing their own thin interface to AOCL-Cryptography APIs or by routing through the OpenSSL provider path. Actual performance may vary based on system configuration, software versions, workload, BIOS settings, kernel version, compiler version, and other factors. AMD, the AMD Arrow logo, EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Intel and Xeon are trademarks of Intel Corporation or its subsidiaries. ClickHouse is a trademark of ClickHouse, Inc. RocksDB is a trademark of Meta Platforms, Inc. strongSwan is a trademark of the strongSwan Project.

© 2026 Advanced Micro Devices, Inc. All rights reserved.

Share:

Article By


Related Blogs