Skip to main content

Unlocking LLM Inference Performance with ROCm FlyDSL

abstract background

Abstract

This advanced hands-on workshop introduces ROCm FlyDSL, a Python-based domain-specific language (DSL) for developing high-performance GPU kernels with low-level control on AMD GPUs. Attendees will receive a concise introduction to FlyDSL and learn how to implement high-performance kernels using the library. The workshop will also showcase practical optimization techniques for improving end-to-end serving performance of the Kimi K2.5 model using optimized FlyDSL Mixture-of-Experts (MoE) kernels.

July 22, 2026 16:30 - 17:15

Speakers


Presented By