Skip to main content

Transforming Ethernet from Underdog to Champion for AI Inference

abstract background

Abstract

Modern AI inference is increasingly constrained by KV cache movement across prefill, decode, and storage tiers rather than GPU compute. This session demonstrates how a lightweight software layer enables RDMA-class performance on standard Ethernet networks without application changes, supporting high-performance disaggregated vLLM inference. Live benchmarks highlight improvements in time-to-first-token (TTFT) and inter-token latency (ITL).

July 21, 2026 16:50 - 17:10

Speakers


Presented By