Liquid AI & AMD Show the Future of On-Device AI With Local Private Meeting Summarization
Jan 05, 2026
Liquid AI and AMD are showcasing the next era of GenAI: AI everywhere— powered by high-quality, application-specific, efficient models that can be run on a broad spectrum of personal devices. Liquid AI’s Liquid Foundation Models (LFMs) use an efficiency-first architecture designed to minimize memory use, reduce activation overhead, and support rapid task-specific fine-tuning. The AMD Ryzen™ AI platform enables these models to be optimally deployed within standard consumer hardware limits. Together, this unlocks a world where highly specialized models run privately, protected, and cost-efficiently anywhere businesses and individuals need them, not just in the cloud.
To showcase this potential, we fine-tuned a Liquid Foundation Model (LFM) for a particular meeting transcript summarization demo and deployed it directly on an AMD Ryzen™ AI 400 Series processor. This demonstrates that LFMs (fine-tuned by Liquid AI and accelerated across the full AMD AI PC hardware stack) deliver production-grade quality typical of much larger cloud-based models, while running entirely on the edge—even on standard 16GB RAM systems.
The result? Fast, reliable, private, and protected AI without compromising accuracy.
This project went from zero to deployed in under two weeks, showcasing how LFMs are powering the “AI everywhere” movement, and the AMD AI PC platform stands alone as the first to run them end-to-end across CPU, GPU, and NPU.
“AI Everywhere” Requires Solving On-Device AI’s Hard Problems
Running high-quality AI on consumer hardware isn’t just a matter of speed, it’s a matter of physics. On-device deployment faces two fundamental bottlenecks that most transformer-based LLMs simply weren’t designed to overcome:
- RAM is the real bottleneck on-device
- Even powerful laptops and PCs often have 16–64GB of total system memory, a stark contrast to the abundant High Bandwidth Memory (HBM) on data-center GPUs.
- But today’s transformer-based open-source models—like OpenAI’s GPT-OSS-20B and others—are memory-hungry by design. Their attention layers scale quadratically with sequence length, and their activation footprints balloon at runtime.
- This makes them expensive, slow, and often outright impossible to deploy natively on typical consumer hardware.
- Smaller general-purpose models are lower quality
- To fit on-device, models must be much smaller. However, smaller general-purpose models are inevitably of lower quality due to information-compression limits, and users immediately notice the drop in quality.
- Speed or memory efficiency doesn’t matter if the model can’t deliver the quality people expect from cloud-scale AI.
Unlocking the true “AI everywhere” future requires cloud-quality intelligence to become genuinely deployable on the hardware people already own.
The only path forward. On-device models must be:
- Extremely RAM efficient,
- Small enough to run locally, and
- Specialized so that small models deliver big-model quality.
Liquid AI’s Technology Advantage: Efficiency by Design
Liquid AI's approach to foundation models is fundamentally different: efficiency by design, not compression.
Open-source LLMs (like OpenAI’s GPT-OSS-20B) are often optimized for data center GPUs with abundant memory and power. To achieve quality on the edge, they typically require significant sacrifices in storage and memory.
Liquid AI’s LFMs take a different path. Rather than building large cloud models and squeezing their size down through quantization and other methods, LFMs are architected from the ground up to be lean, fast, and hardware aware. They optimize internal memory layout, attention mechanisms, and parameter allocation specifically for hardware constraints and low-latency needs.
Liquid AI’s latest architecture, LFM2 [see LFM2 Technical Report on arXiv], demonstrates this design philosophy. It’s a hybrid model that relies only ~20% on attention, with most computation handled by fast, RAM-friendly 1D short convolutions—dramatically reducing memory footprint and boosting speed, without sacrificing capability.
In addition to being efficient to run, on-device AI requires small models that behave similarly to large ones for their specific tasks. Thus, LFM2 was also designed to be extremely efficient to personalize, unlocking high-quality on-device GenAI via application-specific specialization:
- 300% more GPU-efficient fine-tuning vs. LFM1
- Specialization that completes in hours, not days
- Application-specific performance that rivals cloud-scale models
The LFM2 portfolio includes:
- Text Models: 350M to 2.6B parameters, plus an 8B MoE (Mixture of Experts) with only 1B active parameters.
- Multimodal Models: Vision and audio capabilities
- Nano Models: Ultra-compact fine-tuned models for constrained environments
This architectural approach makes LFMs uniquely suited for the emerging paradigm of on-device AI, delivering cloud-quality intelligence on the hardware people already own.
AMD & Liquid AI: Unlocking the Edge
To demonstrate what is possible, AMD and Liquid AI collaborated to fine-tune and deploy a small 2.6B parameter model (LFM2-2.6) model directly on AMD Ryzen AI hardware.
Leveraging the flexibility of the LFM2 backbone, along with several iterations of data curation, fine-tuning, and evaluation, the team successfully developed a custom model for the AMD GAIA meeting transcript summary in under two weeks. Benchmarking shows the model outperforming GPT-OSS-20B and approaching the performance of Qwen3-30B and Claude Sonnet on short (1K transcripts). On long (10K) transcripts the LFM2 model still outperforms GPT-OSS-20B but underperforms the significantly larger Qwen3-30B and cloud models a little more than on the short transcripts.
Table 1 – GAIA LLM Judge Scores for Meeting Transcript Summarization Task1
| Model | Model Size1 | Accuracy Rating | |
| Short (1K tokens) | Long (10K tokens) | ||
| Claude Sonnet 4 | Large Cloud Model | 90% | 93% |
| Qwen3-30B-A3B-Instruct-2507 (Q4_0) | 30B | 88% | 92% |
| LFM2-2.6B-Transcript (Q4_K_M) | 2.6B | 86% | 77% |
| --20 (Q4_K_M) | 20B | 83% | 71% |
| Qwen3-8B (Q4_1) | 8B | 65% | 72% |
Crucially, this demonstration also proves that the fine-tuned LFM2-2.6B runs efficiently1 across all three compute engines inside the AMD Ryzen™ AI PC. This makes AMD the first and only AI PC platform to offer full tri-engine inference support for LFMs functionally verified to run on CPU, GPU, and NPU.2
Today, LFM2s deployed on AMD Ryzen AI PCs deliver fast, protected, high-quality performance while giving system designers maximum flexibility to balance latency, battery life, and responsiveness.
Technical Results: Cloud-Quality Summaries in <2GB RAM
AMD approached Liquid AI with a clear goal: to power high-quality meeting transcript summarization fully on-device in an AI PC, without relying on the cloud. Because the target workflow—the AMD meeting summarization use case—has a well-defined input format, output format, domain, and length, the teams could design a tailor-made model rather than a one-size-fits-all general LLM. That narrow, application-specific scope is what makes it possible for a 2.6B-parameter LFM2 model to deliver large-model quality while staying within the tight memory and power budgets of a mainstream 16GB AI PC.
To formalize the problem, AMD and Liquid aligned on three things up front:
- Task specification: a stable system prompt, structured transcript input, up to 10K tokens per meeting, and consistent summary output.
- Quality definition: The AMD GAIA Eval-Judge framework, which uses generated meeting transcripts to avoid overfitting and provides tools for testing and comparing AI model performance across deployment scenarios, including synthetic test data generation, creation of evaluation standards, and automated performance comparison using a cloud-based model as a judge. (The results presented earlier included several samples of generated data for each of eight different meeting types).
- Deployment constraints: come close to Claude Sonnet / Qwen3-30B-A3B quality while fitting comfortably under 4GB of RAM, so the model can run on mass-market 16GB RAM AMD AI PCs—not just 32GB+ developer machines.
Model Selection and Memory Efficiency
Given those constraints, Liquid selected LFM2-2.6B and quantized it to Q4_K_M, taking advantage of LFM2’s hybrid architecture (only ~20% attention) to keep memory usage low even at long context. In this configuration, the specialized model can process 10K tokens (~60-minute meeting transcript) in 2.7GB of RAM, cleanly fitting into typical 16GB RAM systems where only ~4GB is available for AI workloads. This is much less than the many gigabytes required for quality-comparable transformer models. The table below summarizes RAM usage at equal context length:
Table 2 – RAM Required to Aummarize a 60-minute Meeting (10K Tokens), on CPU2
| Model | RAM (GB) | % Larger vs LFM2-2.6B |
| Qwen3-8B (Q4_1) | 6.2 | 133% larger |
| GPT-OSS-20B (Q4_K_M) | 9.7 | 266% larger |
| Qwen3-30B-A3B-Instruct-2507 (Q4_0) | 15.2 | 476% larger |
This gap is what makes full on-device deployment on 16GB AI PCs practical for LFM2—but effectively out of reach for many traditional transformer models.
Specialization Speed and Quality
LFM2 was also designed to be fast to specialize. Using iterative cycles of data curation, fine-tuning, and AMD GAIA-based evaluation, Liquid delivered the production-ready LFM2-2.6B model in under two weeks. The specialized model was tuned specifically to:
- Surpass GPT-OSS-20B and Qwen3-8B on AMD GAIA meeting summarization tasks.
- Come close to Qwen3-30B-A3B and Claude Sonnet quality for this specific AMD application.
In other words, a 2.6B-parameter LFM—architected for efficiency and then specialized to a single workflow—can reach cloud-model quality where it matters, while staying within a fraction of the memory and compute footprint.
Results: Speed, Energy, and Engine Inference
On the performance side, current profiling using llama-bench on AMD Ryzen™ AI Max+ 395 processor shows that the same LFM2-2.6B Q4_K_M model can summarize a 60-minute, 10K-token meeting into a 1K-token summary in 16 seconds—fast enough for interactive, near-real-time workflows rather than “batch overnight jobs.” Benchmarks against larger baselines show that LFM2-2.6B is:
Table 3 – Time to Summarize a 60-Minute Meeting on AMD Ryzen AI MAX+ 3952
| Model | Time (s) | % Faster of LFM2-2.6B |
| Qwen3-8B (Q4_1) | 39 | 59% faster |
| GPT-OSS-20B (Q4_K_M) | 23 | 30% faster |
| Qwen3-30B-A3B-Instruct-2507 (Q4_0) | 27 | 42% faster |
Table 4 – Time to summarize a 60-minute meeting on AMD Ryzen AI 400 Series3
| Model | Time (s) | % Faster of LFM2-2.6B |
| Qwen3-8B (Q4_1) | 112 | 63% |
| GPT-OSS-20B (Q4_K_M) | 67 | 37% |
| Qwen3-30B-A3B-Instruct-2507 (Q4_0) | 82 | 49% |
Taken together, these results show that a specialized LFM2-2.6B model on AMD hardware can deliver:
- Cloud-grade summarization quality defined by GAIA,
- Sub-3GB RAM usage at long context, and
- Significantly lower latency and energy than larger transformer baselines—all while running fully on-device on a mainstream 16GB RAM AMD AI PC.
For Your Business: Production-Quality AI Without Cloud Dependency
This demonstration proves that specialized models, running on the Ryzen AI hardware, deliver production-quality solutions entirely on-device with true privacy, low latency, and zero cloud costs.
When models are purpose-built for specific workflows and optimized for hardware, everything improves:
- Quality: Expert models outperform generalists
- Efficiency: Run on mainstream 16GB RAM laptops, not 64GB workstations
- Privacy: Your data never leaves your device
- Reliability: No internet required, no cloud outages
- Cost: No API fees or cloud infrastructure bills
- Speed: Real-time responses with zero network latency
This isn't just better technology—it's a fundamental shift in how enterprises deploy AI.
The Future Is Efficient, Specialized, and On-Device
The future of GenAI isn't just one giant model—it's thousands of tailored ones, each optimized for specific scenarios and devices. Fast, private, efficient, specialized—it's here today, with Liquid AI and AMD.
Build with AMD hardware — AMD Ryzen AI 400 Series PCs optimized for on-device AI workloads.
Build with Liquid's LFMs—scenario-specific models delivering cloud-quality intelligence at a fraction of the cost and energy.
Footnotes
- The data in Table 1 was generated using the GAIA Eval-Judge framework. We used 24 synthetic 1K transcripts and 32 synthetic 10K transcripts distributed across 8 different meeting types. We used the Claude Sonnet 4 model for both content generation and judging.
- The data in Tables 2 and 3 was generated using llama-bench.exe b7250 on an HP Z2 Mini G1a Next Gen AI Desktop Workstation with an AMD Ryzen AI Max+ PRO 395 processor. We compute peak memory used during CPU inference by measuring peak memory usage of the llama-bench.exe process executing the command:
llama-bench -m <MODEL> -p 10000 -n 1000 -t 8 -r 3 -ngl 0 The llama-bench executable outputs the average inference times for preprocessing and token generation. The reported inference times are for the GPU, enabled using the -ngl 99 flag.
- The data in Table 4 was generated in the same way as Table 3 using a development platform with an AMD Ryzen AI 9 HX 470 processor.
- The data in Table 1 was generated using the GAIA Eval-Judge framework. We used 24 synthetic 1K transcripts and 32 synthetic 10K transcripts distributed across 8 different meeting types. We used the Claude Sonnet 4 model for both content generation and judging.
- The data in Tables 2 and 3 was generated using llama-bench.exe b7250 on an HP Z2 Mini G1a Next Gen AI Desktop Workstation with an AMD Ryzen AI Max+ PRO 395 processor. We compute peak memory used during CPU inference by measuring peak memory usage of the llama-bench.exe process executing the command:
llama-bench -m <MODEL> -p 10000 -n 1000 -t 8 -r 3 -ngl 0 The llama-bench executable outputs the average inference times for preprocessing and token generation. The reported inference times are for the GPU, enabled using the -ngl 99 flag. - The data in Table 4 was generated in the same way as Table 3 using a development platform with an AMD Ryzen AI 9 HX 470 processor.