Skip to main content

Efficient LLM Serving at Scale with Unified Caching

abstract background

Abstract

This is an advanced user hands-on workshop to show TensorMesh and AMD enabling efficient LLM serving through an unified caching layer. You will learn how tiered KV cache management can brings out the benefits of cache-aware inference, improving throughput under interactive latency SLAs, reducing TTFT through KV cache reuse/offload & enabling production-style distributed inference on Instinct GPUs.

July 23, 2026 13:00 - 13:45

Speakers


Presented By