Building Next-Gen AI Infrastructure: Scaling Enterprise LLM Serving with RadixArk

Name: Building Next-Gen AI Infrastructure: Scaling Enterprise LLM Serving with RadixArk
Start: 2026-07-22T14:30:00-07:00
End: 2026-07-22T14:50:00-07:00

Deploying generative AI at scale requires robust and highly optimized infrastructure. Join Ying Sheng, Co-creator of SGLang and Founder of RadixArk, to learn how innovations such as structured generation and advanced scheduling enable efficient, enterprise-ready AI deployments. Discover techniques for maximizing utilization, reducing latency, and scaling AI applications across diverse environments.

July 22, 2026 2:30 PM - 2:50 PM PDT

Co-Founder & CEO | RadixArk

Topic

AI Training & Inference

Developer Platforms & Open Ecosystems

Session Type

Tech Talk

Domain-Specific AI at Scale: Open Models, Post-Training, and AI Infrastructure

Learn how domain-specific AI moves beyond generic models using post-training, domain evals, and scalable open infrastructure. Using Open Telco Models as a case study, this session covers curated data, reward loops, unified training and serving, and AMD Instinct/ROCm-based stacks for building specialized AI systems at enterprise scale.;Learn how domain-specific AI moves beyond generic models using post-training, domain evals, and scalable open infrastructure. Using Open Telco Models as a case study, this session covers curated data, reward loops, unified training and serving, and AMD Instinct/ROCm-based stacks for building specialized AI systems at enterprise scale.

July 23, 2026
From Models to Production—A Blueprint for AI at Scale

Moving AI from training to production takes more than GPUs. Hear how Microsoft and Chai AI built scalable AI infrastructure on Vultr using AMD Instinct GPUs and ROCm. Learn best practices for data locality, secure networking, Kubernetes orchestration, benchmarking, cost optimization, and scale-out operations. Leave with a practical blueprint for deploying fast, portable, production-ready AI workloads.;Moving AI from training to production takes more than GPUs. Hear how Microsoft and Chai AI built scalable AI infrastructure on Vultr using AMD Instinct GPUs and ROCm. Learn best practices for data locality, secure networking, Kubernetes orchestration, benchmarking, cost optimization, and scale-out operations. Leave with a practical blueprint for deploying fast, portable, production-ready AI workloads.

July 23, 2026
Accelerating LLM Inference on AMD GPUs with AMD ATOM

This advanced hands-on workshop introduces AMD ATOM, an open-source optimized LLM inference backend for ROCm. Learn to serve LLMs with popular workflows using AMD-optimized attention & inference kernels. The Workshop introduces out-of-tree plugins for existing vLLM & SGLang users & aims at demonstrating how ATOM preserves familiarity of the frameworks while accelerating model execution & boosting inference performance, bridging opensource frameworks with the AMD high-performance inference stack.;This advanced hands-on workshop introduces AMD ATOM, an open-source optimized LLM inference backend for ROCm. Learn to serve LLMs with popular workflows using AMD-optimized attention & inference kernels. The Workshop introduces out-of-tree plugins for existing vLLM & SGLang users & aims at demonstrating how ATOM preserves familiarity of the frameworks while accelerating model execution & boosting inference performance, bridging opensource frameworks with the AMD high-performance inference stack.

July 23, 2026
Zyphra: Large-Model Training Lessons on AMD

Learn what it took to train ZAYA1-74B, a 74B-parameter mixture-of-experts model, end-to-end on AMD Instinct MI300X. This session shares key engineering lessons from designing an efficient training stack, optimizing long-context performance, and building a reinforcement learning pipeline for math, code, and agentic AI workloads. Discover practical insights for training and deploying large AI models on AMD infrastructure.;Learn what it took to train ZAYA1-74B, a 74B-parameter mixture-of-experts model, end-to-end on AMD Instinct MI300X. This session shares key engineering lessons from designing an efficient training stack, optimizing long-context performance, and building a reinforcement learning pipeline for math, code, and agentic AI workloads. Discover practical insights for training and deploying large AI models on AMD infrastructure.

July 23, 2026
Training at Scale with AMD Primus

Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.;Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.

July 23, 2026
Agentic Kernel Performance Tuning with AMD ROCm

This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.;This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.

July 23, 2026
Redefining Scalable AI Performance: OCI Supercomputing in the Cloud

Organizations building frontier AI models need infrastructure designed for performance at scale. This session shows how OCI combines AMD Instinct, AMD EPYC, and Pensando in Oracle Acceleron to enable ultra-low-latency networking for high-throughput distributed workloads, with practical guidance for designing infrastructure for large language, multimodal, and scientific AI models.;Organizations building frontier AI models need infrastructure designed for performance at scale. This session shows how OCI combines AMD Instinct, AMD EPYC, and Pensando in Oracle Acceleron to enable ultra-low-latency networking for high-throughput distributed workloads, with practical guidance for designing infrastructure for large language, multimodal, and scientific AI models.

July 23, 2026
Accelerating Inference at Scale: Crusoe's Experience with AMD

As a customer and operator of AMD technology, Crusoe’s Managed Inference team has built a production inference stack designed for speed, efficiency, and scale. This session will show how AMD Instinct, including MI355X, helped shape its serverless inference offering and what teams can apply when building production AI services that balance performance, memory bandwidth, and cost.;As a customer and operator of AMD technology, Crusoe’s Managed Inference team has built a production inference stack designed for speed, efficiency, and scale. This session will show how AMD Instinct, including MI355X, helped shape its serverless inference offering and what teams can apply when building production AI services that balance performance, memory bandwidth, and cost.

July 23, 2026

Building Next-Gen AI Infrastructure: Scaling Enterprise LLM Serving with RadixArk

Abstract

Speakers

Presented By

Related Sessions

AMD.com Feedback