Persistent KV Cache for Continuous Inference with VAST Data

Name: Persistent KV Cache for Continuous Inference with VAST Data
Start: 2026-07-22T15:00:00-07:00
End: 2026-07-22T15:25:00-07:00

Modern agentic AI systems require persistent context and high-throughput inference infrastructure that scales efficiently. This session explores the role of KV cache on inference workloads on AMD Instinct GPUs, highlighting the advantages of AMD memory architecture for long-context and continuous inference systems. Learn how the VAST AI OS enables persistent KV cache and context-aware inference pipelines, reducing recomputation while improving performance, efficiency, and scalability.

July 22, 2026 3:00 PM - 3:25 PM PDT

Co-Founder and CTO | VAST Data

Topic

AI Training & Inference

Agentic & Generative AI

Session Type

Meet the Experts

Unlock the Power of AI with AI Producer Studio

AI Producer Studio is optimized for AMD Ryzen AI processors to automate multi-camera meetings, livestreams, and recordings. See how AI detects presenters, follows conversations, and automatically manages camera switching, framing, and layouts in real time. Learn how organizations can simplify professional video production and enhance hybrid collaboration with AMD.;AI Producer Studio is optimized for AMD Ryzen AI processors to automate multi-camera meetings, livestreams, and recordings. See how AI detects presenters, follows conversations, and automatically manages camera switching, framing, and layouts in real time. Learn how organizations can simplify professional video production and enhance hybrid collaboration with AMD.

July 23, 2026
Domain-Specific AI at Scale: Open Models, Post-Training, and AI Infrastructure

Learn how domain-specific AI moves beyond generic models using post-training, domain evals, and scalable open infrastructure. Using Open Telco Models as a case study, this session covers curated data, reward loops, unified training and serving, and AMD Instinct/ROCm-based stacks for building specialized AI systems at enterprise scale.;Learn how domain-specific AI moves beyond generic models using post-training, domain evals, and scalable open infrastructure. Using Open Telco Models as a case study, this session covers curated data, reward loops, unified training and serving, and AMD Instinct/ROCm-based stacks for building specialized AI systems at enterprise scale.

July 23, 2026
From Models to Production—A Blueprint for AI at Scale

Moving AI from training to production takes more than GPUs. Hear how Microsoft and Chai AI built scalable AI infrastructure on Vultr using AMD Instinct GPUs and ROCm. Learn best practices for data locality, secure networking, Kubernetes orchestration, benchmarking, cost optimization, and scale-out operations. Leave with a practical blueprint for deploying fast, portable, production-ready AI workloads.;Moving AI from training to production takes more than GPUs. Hear how Microsoft and Chai AI built scalable AI infrastructure on Vultr using AMD Instinct GPUs and ROCm. Learn best practices for data locality, secure networking, Kubernetes orchestration, benchmarking, cost optimization, and scale-out operations. Leave with a practical blueprint for deploying fast, portable, production-ready AI workloads.

July 23, 2026
Zyphra: Large-Model Training Lessons on AMD

Learn what it took to train ZAYA1-74B, a 74B-parameter mixture-of-experts model, end-to-end on AMD Instinct MI300X. This session shares key engineering lessons from designing an efficient training stack, optimizing long-context performance, and building a reinforcement learning pipeline for math, code, and agentic AI workloads. Discover practical insights for training and deploying large AI models on AMD infrastructure.;Learn what it took to train ZAYA1-74B, a 74B-parameter mixture-of-experts model, end-to-end on AMD Instinct MI300X. This session shares key engineering lessons from designing an efficient training stack, optimizing long-context performance, and building a reinforcement learning pipeline for math, code, and agentic AI workloads. Discover practical insights for training and deploying large AI models on AMD infrastructure.

July 23, 2026
Training at Scale with AMD Primus

Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.;Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.

July 23, 2026
From Tokens to Outcomes: Driving AI ROI with Lenovo Hybrid Infrastructure

AI success is increasingly measured by business outcomes, not model size. As agentic AI accelerates inference demand, organizations must improve token efficiency, infrastructure utilization, and energy consumption to maximize ROI. Learn how Lenovo Hybrid AI Factories, powered by AMD, help enterprises deploy AI from personal systems to rack-scale environments while reducing token costs, increasing control and utilization, and supporting more sustainable AI growth.;AI success is increasingly measured by business outcomes, not model size. As agentic AI accelerates inference demand, organizations must improve token efficiency, infrastructure utilization, and energy consumption to maximize ROI. Learn how Lenovo Hybrid AI Factories, powered by AMD, help enterprises deploy AI from personal systems to rack-scale environments while reducing token costs, increasing control and utilization, and supporting more sustainable AI growth.

July 23, 2026
HP & AMD Co-engineering Local Compute for Agentic AI

As AI workloads explode, a cloud-only approach creates friction in data movement, latency, cost, and security. This session explores how keeping compute close to data unlocks measurable advantages for AI workflows. Learn how HP and AMD are co-engineering AI workstations with high-performance local compute and the memory capacity to support larger models and datasets. The future is hybrid—running AI in the right place, at the right time, across environments.;As AI workloads explode, a cloud-only approach creates friction in data movement, latency, cost, and security. This session explores how keeping compute close to data unlocks measurable advantages for AI workflows. Learn how HP and AMD are co-engineering AI workstations with high-performance local compute and the memory capacity to support larger models and datasets. The future is hybrid—running AI in the right place, at the right time, across environments.

July 23, 2026
Agentic AI Needs Open Infrastructure: Here's How to Build It

The agentic AI stack has evolved to fast multi-model orchestration, tool-augmented reasoning, and long-running inference chains. The hardware conversation hasn't kept up, and many teams default to one GPU vendor without evaluating alternatives. This interactive session is for builders to learn what they're missing. We'll review head-to-head benchmark data from third-party testing, discuss production-ready serving stacks on ROCm, and break down TCO for teams running multi-step agents at scale.;The agentic AI stack has evolved to fast multi-model orchestration, tool-augmented reasoning, and long-running inference chains. The hardware conversation hasn't kept up, and many teams default to one GPU vendor without evaluating alternatives. This interactive session is for builders to learn what they're missing. We'll review head-to-head benchmark data from third-party testing, discuss production-ready serving stacks on ROCm, and break down TCO for teams running multi-step agents at scale.

July 23, 2026

Persistent KV Cache for Continuous Inference with VAST Data

Abstract

Speakers

Presented By

Related Sessions

AMD.com Feedback