Skip to main content

VAST Technologies: Persistent KV Cache for Continuous Inference on AMD

abstract background

Abstract

Modern agentic AI systems require persistent context and high-throughput inference infrastructure that scales efficiently. This session explores the role of KV cache on inference workloads on AMD Instinct GPUs, highlighting the advantages of AMD memory architecture for long-context and continuous inference systems. Learn how the VAST AI OS enables persistent KV cache and context-aware inference pipelines, reducing recomputation while improving performance, efficiency, and scalability.

July 22, 2026 4:00 PM - 4:25 PM PDT

Speakers