OpenJarvis: Personal AI, on Personal Devices
Abstract
OpenJarvis is an open-source framework for building personal AI agents that run entirely on local devices. Motivated by the Intelligence Per Watt finding that local models already handle most real-world queries, OpenJarvis provides the missing software stack to make local-first personal AI practical. Its core ideas are shared primitives, efficiency-first evaluation, and continual self-improvement from local trace data. It supports multiple inference backends and an energy leaderboard.
July 22, 2026 13:30 - 13:50
Speakers
Presented By
PhD Candidate, Scaling Intelligence Lab and Hazy Research, Stanford University | OpenJarvis
PhD Candidate, Hazy Research, Stanford University | OpenJarvis
Session Type
Tech Talk
Related Sessions
-
Introduction to Robotics with ROS2 on AMD
Introduction to Robotics with ROS2 on AMD
This intermediate, interactive session focuses on deploying an end-to-end Vision-Language-Action (VLA) pipeline using ROS 2 and multimodal models to control a robotic arm. Participants will work through the steps to build and deploy a system that responds to physical inputs with real-world actions. Prior familiarity with ROS 2 and VLA concepts is recommended.;This intermediate, interactive session focuses on deploying an end-to-end Vision-Language-Action (VLA) pipeline using ROS 2 and multimodal models to control a robotic arm. Participants will work through the steps to build and deploy a system that responds to physical inputs with real-world actions. Prior familiarity with ROS 2 and VLA concepts is recommended.
July 23, 2026
-
Accelerating vLLM Inference on AMD Instinct GPUs with AMD ATOM
Accelerating vLLM Inference on AMD Instinct GPUs with AMD ATOM
This advanced hands-on workshop introduces AMD ATOM an opensource optimized LLM inference backend for ROCm. Learn to serve LLMs with popular workflows using AMD-optimized attention & inference kernels. The Workshop introduces out-of-tree plugins for existing vLLM & SGLang users & aims at demonstrating how ATOM preserves familiarity of the frameworks while accelerating model execution & boosting inference performance, bridging opensource frameworks with the AMD high-performance inference stack.;This advanced hands-on workshop introduces AMD ATOM an opensource optimized LLM inference backend for ROCm. Learn to serve LLMs with popular workflows using AMD-optimized attention & inference kernels. The Workshop introduces out-of-tree plugins for existing vLLM & SGLang users & aims at demonstrating how ATOM preserves familiarity of the frameworks while accelerating model execution & boosting inference performance, bridging opensource frameworks with the AMD high-performance inference stack.
July 23, 2026
-
Training at Scale with AMD Primus
Training at Scale with AMD Primus
Primus makes large scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. Primus’ SOTA pre and post training performance, proven at scales of thousands of GPUs, positions instinct as a competitive solution for model development at frontier labs, enterprises and AI startups.;Primus makes large scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. Primus’ SOTA pre and post training performance, proven at scales of thousands of GPUs, positions instinct as a competitive solution for model development at frontier labs, enterprises and AI startups.
July 23, 2026
-
Benchmarking AI Systems: from Model Metrics to Real-World Performance
Benchmarking AI Systems: from Model Metrics to Real-World Performance
The agentic AI stack has evolved to fast multi-model orchestration, tool-augmented reasoning, and long-running inference chains. The hardware conversation hasn't kept up, and many teams default to one GPU vendor without evaluating alternatives. This interactive session is for builders to learn what they're missing. We'll review head-to-head benchmark data from third-party testing, discuss production-ready serving stacks on ROCm, and break down TCO for teams running multi-step agents at scale.;The agentic AI stack has evolved to fast multi-model orchestration, tool-augmented reasoning, and long-running inference chains. The hardware conversation hasn't kept up, and many teams default to one GPU vendor without evaluating alternatives. This interactive session is for builders to learn what they're missing. We'll review head-to-head benchmark data from third-party testing, discuss production-ready serving stacks on ROCm, and break down TCO for teams running multi-step agents at scale.
July 23, 2026
-
Agentic Kernel Performance Tuning with AMD ROCm
Agentic Kernel Performance Tuning with AMD ROCm
This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.;This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.
July 23, 2026
-
Efficient LLM Serving at Scale with Unified Caching
Efficient LLM Serving at Scale with Unified Caching
This is an advanced user hands-on workshop to show TensorMesh and AMD enabling efficient LLM serving through an unified caching layer. You will learn how tiered KV cache management can brings out the benefits of cache-aware inference, improving throughput under interactive latency SLAs, reducing TTFT through KV cache reuse/offload & enabling production-style distributed inference on Instinct GPUs.;This is an advanced user hands-on workshop to show TensorMesh and AMD enabling efficient LLM serving through an unified caching layer. You will learn how tiered KV cache management can brings out the benefits of cache-aware inference, improving throughput under interactive latency SLAs, reducing TTFT through KV cache reuse/offload & enabling production-style distributed inference on Instinct GPUs.
July 23, 2026
-
Unlocking LLM Inference Performance with ROCm FlyDSL
Unlocking LLM Inference Performance with ROCm FlyDSL
This advanced hands-on workshop introduces ROCm FlyDSL, a Python-based domain-specific language (DSL) for developing high-performance GPU kernels with low-level control on AMD GPUs. Attendees will receive a concise introduction to FlyDSL and learn how to implement high-performance kernels using the library. The workshop will also showcase practical optimization techniques for improving end-to-end serving performance of the Kimi K2.5 model using optimized FlyDSL Mixture-of-Experts (MoE) kernels.;This advanced hands-on workshop introduces ROCm FlyDSL, a Python-based domain-specific language (DSL) for developing high-performance GPU kernels with low-level control on AMD GPUs. Attendees will receive a concise introduction to FlyDSL and learn how to implement high-performance kernels using the library. The workshop will also showcase practical optimization techniques for improving end-to-end serving performance of the Kimi K2.5 model using optimized FlyDSL Mixture-of-Experts (MoE) kernels.
July 23, 2026
-
Power is Your Biggest Hidden Cost: How AMD Can Help
Power is Your Biggest Hidden Cost: How AMD Can Help
Power is the AI infrastructure cost nobody budgets for until it breaks the business case. In this interactive technical session, an expert from 5C joins AMD to unpack how power consumption impacts total cost of ownership across inference and training deployments. Discuss how intelligent power management, real-world thermal constraints, and silicon-level efficiency shape what your AI infrastructure can sustain. Practical insight for architects and operators making deployment decisions today.;Power is the AI infrastructure cost nobody budgets for until it breaks the business case. In this interactive technical session, an expert from 5C joins AMD to unpack how power consumption impacts total cost of ownership across inference and training deployments. Discuss how intelligent power management, real-world thermal constraints, and silicon-level efficiency shape what your AI infrastructure can sustain. Practical insight for architects and operators making deployment decisions today.
July 23, 2026