Liquid AI & AMD Show the Future of On-Device AI With Local Private Meeting Summarization

Jan 05, 2026

Liquid AI and AMD are showcasing the next era of GenAI: AI everywhere— powered by high-quality, application-specific, efficient models that can be run on a broad spectrum of personal devices. Liquid AI’s Liquid Foundation Models (LFMs) use an efficiency-first architecture designed to minimize memory use, reduce activation overhead, and support rapid task-specific fine-tuning. The AMD Ryzen™ AI platform enables these models to be optimally deployed within standard consumer hardware limits. Together, this unlocks a world where highly specialized models run privately, protected, and cost-efficiently anywhere businesses and individuals need them, not just in the cloud.

To showcase this potential, we fine-tuned a Liquid Foundation Model (LFM) for a particular meeting transcript summarization demo and deployed it directly on an AMD Ryzen™ AI 400 Series processor. This demonstrates that LFMs (fine-tuned by Liquid AI and accelerated across the full AMD AI PC hardware stack) deliver production-grade quality typical of much larger cloud-based models, while running entirely on the edge—even on standard 16GB RAM systems.

The result? Fast, reliable, private, and protected AI without compromising accuracy.

This project went from zero to deployed in under two weeks, showcasing how LFMs are powering the “AI everywhere” movement, and the AMD AI PC platform stands alone as the first to run them end-to-end across CPU, GPU, and NPU.

“AI Everywhere” Requires Solving On-Device AI’s Hard Problems

Running high-quality AI on consumer hardware isn’t just a matter of speed, it’s a matter of physics. On-device deployment faces two fundamental bottlenecks that most transformer-based LLMs simply weren’t designed to overcome:

RAM is the real bottleneck on-device
- Even powerful laptops and PCs often have 16–64GB of total system memory, a stark contrast to the abundant High Bandwidth Memory (HBM) on data-center GPUs.
- But today’s transformer-based open-source models—like OpenAI’s GPT-OSS-20B and others—are memory-hungry by design. Their attention layers scale quadratically with sequence length, and their activation footprints balloon at runtime.
- This makes them expensive, slow, and often outright impossible to deploy natively on typical consumer hardware.
Smaller general-purpose models are lower quality
- To fit on-device, models must be much smaller. However, smaller general-purpose models are inevitably of lower quality due to information-compression limits, and users immediately notice the drop in quality.
- Speed or memory efficiency doesn’t matter if the model can’t deliver the quality people expect from cloud-scale AI.

Unlocking the true “AI everywhere” future requires cloud-quality intelligence to become genuinely deployable on the hardware people already own.

The only path forward. On-device models must be:

Extremely RAM efficient,
Small enough to run locally, and
Specialized so that small models deliver big-model quality.

Liquid AI’s Technology Advantage: Efficiency by Design

Liquid AI's approach to foundation models is fundamentally different: efficiency by design, not compression.

Open-source LLMs (like OpenAI’s GPT-OSS-20B) are often optimized for data center GPUs with abundant memory and power. To achieve quality on the edge, they typically require significant sacrifices in storage and memory.

Liquid AI’s LFMs take a different path. Rather than building large cloud models and squeezing their size down through quantization and other methods, LFMs are architected from the ground up to be lean, fast, and hardware aware. They optimize internal memory layout, attention mechanisms, and parameter allocation specifically for hardware constraints and low-latency needs.

Liquid AI’s latest architecture, LFM2 [see LFM2 Technical Report on arXiv], demonstrates this design philosophy. It’s a hybrid model that relies only ~20% on attention, with most computation handled by fast, RAM-friendly 1D short convolutions—dramatically reducing memory footprint and boosting speed, without sacrificing capability.

In addition to being efficient to run, on-device AI requires small models that behave similarly to large ones for their specific tasks. Thus, LFM2 was also designed to be extremely efficient to personalize, unlocking high-quality on-device GenAI via application-specific specialization:

300% more GPU-efficient fine-tuning vs. LFM1
Specialization that completes in hours, not days
Application-specific performance that rivals cloud-scale models

The LFM2 portfolio includes:

Text Models: 350M to 2.6B parameters, plus an 8B MoE (Mixture of Experts) with only 1B active parameters.
Multimodal Models: Vision and audio capabilities
Nano Models: Ultra-compact fine-tuned models for constrained environments

This architectural approach makes LFMs uniquely suited for the emerging paradigm of on-device AI, delivering cloud-quality intelligence on the hardware people already own.

AMD & Liquid AI: Unlocking the Edge

To demonstrate what is possible, AMD and Liquid AI collaborated to fine-tune and deploy a small 2.6B parameter model (LFM2-2.6) model directly on AMD Ryzen AI hardware.

Leveraging the flexibility of the LFM2 backbone, along with several iterations of data curation, fine-tuning, and evaluation, the team successfully developed a custom model for the AMD GAIA meeting transcript summary in under two weeks. Benchmarking shows the model outperforming GPT-OSS-20B and approaching the performance of Qwen3-30B and Claude Sonnet on short (1K transcripts). On long (10K) transcripts the LFM2 model still outperforms GPT-OSS-20B but underperforms the significantly larger Qwen3-30B and cloud models a little more than on the short transcripts.

Table 1 – GAIA LLM Judge Scores for Meeting Transcript Summarization Task¹

Model	Model Size1	Accuracy Rating
Model	Model Size1	Short (1K tokens)	Long (10K tokens)
Claude Sonnet 4	Large Cloud Model	90%	93%
Qwen3-30B-A3B-Instruct-2507 (Q4_0)	30B	88%	92%
LFM2-2.6B-Transcript (Q4_K_M)	2.6B	86%	77%
--20 (Q4_K_M)	20B	83%	71%
Qwen3-8B (Q4_1)	8B	65%	72%

Crucially, this demonstration also proves that the fine-tuned LFM2-2.6B runs efficiently¹ across all three compute engines inside the AMD Ryzen™ AI PC. This makes AMD the first and only AI PC platform to offer full tri-engine inference support for LFMs functionally verified to run on CPU, GPU, and NPU.²

Today, LFM2s deployed on AMD Ryzen AI PCs deliver fast, protected, high-quality performance while giving system designers maximum flexibility to balance latency, battery life, and responsiveness.

Technical Results: Cloud-Quality Summaries in <2GB RAM

AMD approached Liquid AI with a clear goal: to power high-quality meeting transcript summarization fully on-device in an AI PC, without relying on the cloud. Because the target workflow—the AMD meeting summarization use case—has a well-defined input format, output format, domain, and length, the teams could design a tailor-made model rather than a one-size-fits-all general LLM. That narrow, application-specific scope is what makes it possible for a 2.6B-parameter LFM2 model to deliver large-model quality while staying within the tight memory and power budgets of a mainstream 16GB AI PC.

To formalize the problem, AMD and Liquid aligned on three things up front:

Task specification: a stable system prompt, structured transcript input, up to 10K tokens per meeting, and consistent summary output.
Quality definition: The AMD GAIA Eval-Judge framework, which uses generated meeting transcripts to avoid overfitting and provides tools for testing and comparing AI model performance across deployment scenarios, including synthetic test data generation, creation of evaluation standards, and automated performance comparison using a cloud-based model as a judge. (The results presented earlier included several samples of generated data for each of eight different meeting types).
Deployment constraints: come close to Claude Sonnet / Qwen3-30B-A3B quality while fitting comfortably under 4GB of RAM, so the model can run on mass-market 16GB RAM AMD AI PCs—not just 32GB+ developer machines.

Model Selection and Memory Efficiency

Given those constraints, Liquid selected LFM2-2.6B and quantized it to Q4_K_M, taking advantage of LFM2’s hybrid architecture (only ~20% attention) to keep memory usage low even at long context. In this configuration, the specialized model can process 10K tokens (~60-minute meeting transcript) in 2.7GB of RAM, cleanly fitting into typical 16GB RAM systems where only ~4GB is available for AI workloads. This is much less than the many gigabytes required for quality-comparable transformer models. The table below summarizes RAM usage at equal context length:

Table 2 – RAM Required to Aummarize a 60-minute Meeting (10K Tokens), on CPU²

Model	RAM (GB)	% Larger vs LFM2-2.6B
Qwen3-8B (Q4_1)	6.2	133% larger
GPT-OSS-20B (Q4_K_M)	9.7	266% larger
Qwen3-30B-A3B-Instruct-2507 (Q4_0)	15.2	476% larger

This gap is what makes full on-device deployment on 16GB AI PCs practical for LFM2—but effectively out of reach for many traditional transformer models.

Specialization Speed and Quality

LFM2 was also designed to be fast to specialize. Using iterative cycles of data curation, fine-tuning, and AMD GAIA-based evaluation, Liquid delivered the production-ready LFM2-2.6B model in under two weeks. The specialized model was tuned specifically to:

Surpass GPT-OSS-20B and Qwen3-8B on AMD GAIA meeting summarization tasks.
Come close to Qwen3-30B-A3B and Claude Sonnet quality for this specific AMD application.

In other words, a 2.6B-parameter LFM—architected for efficiency and then specialized to a single workflow—can reach cloud-model quality where it matters, while staying within a fraction of the memory and compute footprint.

Results: Speed, Energy, and Engine Inference

On the performance side, current profiling using llama-bench on AMD Ryzen™ AI Max+ 395 processor shows that the same LFM2-2.6B Q4_K_M model can summarize a 60-minute, 10K-token meeting into a 1K-token summary in 16 seconds—fast enough for interactive, near-real-time workflows rather than “batch overnight jobs.” Benchmarks against larger baselines show that LFM2-2.6B is:

Up to 59% faster than Qwen3-8B (Q4_1)
Up to 30% faster than GPT-OSS-20B (Q4_K_M)
Up to 42% faster than Qwen3-30B-A3B-Instruct-2507 (Q4_0)

Table 3 – Time to Summarize a 60-Minute Meeting on AMD Ryzen AI MAX+ 395²

Model	Time (s)	% Faster of LFM2-2.6B
Qwen3-8B (Q4_1)	39	59% faster
GPT-OSS-20B (Q4_K_M)	23	30% faster
Qwen3-30B-A3B-Instruct-2507 (Q4_0)	27	42% faster

Results for an AMD Ryzen AI 400 Series processor are in the table below. On this processor, the LFM2-2.6B model can summarize a 10K-token meeting in 42 seconds.

Table 4 – Time to summarize a 60-minute meeting on AMD Ryzen AI 400 Series³

Model	Time (s)	% Faster of LFM2-2.6B
Qwen3-8B (Q4_1)	112	63%
GPT-OSS-20B (Q4_K_M)	67	37%
Qwen3-30B-A3B-Instruct-2507 (Q4_0)	82	49%

Taken together, these results show that a specialized LFM2-2.6B model on AMD hardware can deliver:

Cloud-grade summarization quality defined by GAIA,
Sub-3GB RAM usage at long context, and
Significantly lower latency and energy than larger transformer baselines—all while running fully on-device on a mainstream 16GB RAM AMD AI PC.

For Your Business: Production-Quality AI Without Cloud Dependency

This demonstration proves that specialized models, running on the Ryzen AI hardware, deliver production-quality solutions entirely on-device with true privacy, low latency, and zero cloud costs.

When models are purpose-built for specific workflows and optimized for hardware, everything improves:

Quality: Expert models outperform generalists
Efficiency: Run on mainstream 16GB RAM laptops, not 64GB workstations
Privacy: Your data never leaves your device
Reliability: No internet required, no cloud outages
Cost: No API fees or cloud infrastructure bills
Speed: Real-time responses with zero network latency

This isn't just better technology—it's a fundamental shift in how enterprises deploy AI.

The Future Is Efficient, Specialized, and On-Device

The future of GenAI isn't just one giant model—it's thousands of tailored ones, each optimized for specific scenarios and devices. Fast, private, efficient, specialized—it's here today, with Liquid AI and AMD.

Build with AMD hardware — AMD Ryzen AI 400 Series PCs optimized for on-device AI workloads.

Learn More

Build with Liquid's LFMs—scenario-specific models delivering cloud-quality intelligence at a fraction of the cost and energy.

Get in Touch With Liquid AI Today

Footnotes

The data in Table 1 was generated using the GAIA Eval-Judge framework. We used 24 synthetic 1K transcripts and 32 synthetic 10K transcripts distributed across 8 different meeting types. We used the Claude Sonnet 4 model for both content generation and judging.
The data in Tables 2 and 3 was generated using llama-bench.exe b7250 on an HP Z2 Mini G1a Next Gen AI Desktop Workstation with an AMD Ryzen AI Max+ PRO 395 processor. We compute peak memory used during CPU inference by measuring peak memory usage of the llama-bench.exe process executing the command:
llama-bench -m <MODEL> -p 10000 -n 1000 -t 8 -r 3 -ngl 0 The llama-bench executable outputs the average inference times for preprocessing and token generation. The reported inference times are for the GPU, enabled using the -ngl 99 flag.
The data in Table 4 was generated in the same way as Table 3 using a development platform with an AMD Ryzen AI 9 HX 470 processor.

Article By

Joshua Hort

Senior Director, Head of ISV Enabling for Computing & Graphics

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Liquid AI & AMD Show the Future of On-Device AI With Local Private Meeting Summarization

“AI Everywhere” Requires Solving On-Device AI’s Hard Problems

Liquid AI’s Technology Advantage: Efficiency by Design

AMD & Liquid AI: Unlocking the Edge

Table 1 – GAIA LLM Judge Scores for Meeting Transcript Summarization Task1

Technical Results: Cloud-Quality Summaries in <2GB RAM

Table 2 – RAM Required to Aummarize a 60-minute Meeting (10K Tokens), on CPU2

Table 3 – Time to Summarize a 60-Minute Meeting on AMD Ryzen AI MAX+ 3952

Table 4 – Time to summarize a 60-minute meeting on AMD Ryzen AI 400 Series3

For Your Business: Production-Quality AI Without Cloud Dependency

The Future Is Efficient, Specialized, and On-Device

Article By

Related Blogs

Table 1 – GAIA LLM Judge Scores for Meeting Transcript Summarization Task¹

Table 2 – RAM Required to Aummarize a 60-minute Meeting (10K Tokens), on CPU²

Table 3 – Time to Summarize a 60-Minute Meeting on AMD Ryzen AI MAX+ 395²

Table 4 – Time to summarize a 60-minute meeting on AMD Ryzen AI 400 Series³