Local Tiny Agents: MCP Agents on Ryzen AI with Lemonade Server

Jun 10, 2025

We’re excited to announce that the power of Model Context Protocol (MCP) has been unlocked on AMD Ryzen AI™ PCs. You can get started today by installing AMD Lemonade Server and connecting it to projects like Hugging Face’s Tiny Agents via streaming tool calls.

If you’re new to Lemonade, it's a lightweight open-source local LLM server designed to show the capabilities of AMD AI PCs. Think of it as a docking station for LLMs, letting you plug powerful models directly into apps like Open WebUI and run them locally, with no cloud needed. Developers can also use Lemonade to integrate with modern projects that use the advanced features from the OpenAI standard. This includes awesome projects like Tiny Agents, created by Hugging Face, which makes local tool calling remarkably smooth. 🚀

What Are Tiny Agents?

With MCP (Model Context Protocol), you can connect tools like web search, memory, and filesystem access directly to LLMs. Building on this, Tiny Agents makes it easy to create lightweight autonomous agents using these MCP-connected tools. The LLM simply loops between conversation and tool use—autonomously—until the task is complete.

With Lemonade's latest update, you can now enable these tool interactions using your favorite LLM. That means low-friction access to external tools, and more dynamic, context-aware behavior for your application—with minimal setup.

QuickStart: Run a Lemonade-Powered Tiny Agent

First, download and install Lemonade Server 7.0.2 or above using the installer and launch the server by double clicking the Desktop icon or running the command below.

    lemonade-server serve

Install Node.js (download here) and the latest huggingface_hub with MCP:

    pip install "huggingface_hub[mcp]>=0.32.4"

Save this sample database into your current folder. Then, save the agent config below to a new file named agent.json:

    {
  "model": "Qwen3-8B-GGUF",
  "endpointUrl": "http://localhost:8000/api/",
  "servers": [
    {
      "type": "stdio",
      "config": {
        "command": "C:\\Program Files\\nodejs\\npx.cmd",
        "args": [
          "-y",
          "mcp-server-sqlite-npx",
          "test.db"
        ]
      }
    }
  ]
}

Run your agent with the Command Line Interface (CLI):

    tiny-agents run ./agent.json

This sample SQLite tool allows LLMs to reason about and edit SQLite databases. It’s automatically discovered by the MCP server and connected to the agent. Once connected, the LLM can use it during inference—looping through tool calls as needed. You can try this out with prompts like “Which tables are available?” and “What is the most expensive product?”.

In this example, the Qwen3 model is accelerated using Vulkan, a low-overhead, cross-platform API designed for high-performance graphics and compute tasks, making it ideal for running local LLMs efficiently. With Vulkan, the Qwen3 model delivers fast, responsive inference by accelerating on your AMD Radeon™ GPU or integrated GPU.

Beyond Basic Examples: Stepping up your MCP Game

There are hundreds of MCP servers available: from basic calculators to image generation and editing frameworks. This flexibility makes it easy to find (or build) an MCP setup that fits your workflow. Some good resources to get started with MCP servers are awesome-mcp-servers and mcpservers.org.

If your application is designed for concise context usage and you're interested in exploring NPU + iGPU acceleration, check out the Hybrid models available on Lemonade for the AMD Ryzen AI 300 series PCs. This includes models like Llama-xLAM-2-8b-fc-r-Hybrid, which have been fine-tuned for tool-calling and deliver snappy responses!

Why This Matters

We’re entering a new chapter where AI agents don’t just respond: they act. Running Tiny Agents locally with Lemonade Server makes it easier to build practical, tool-using LLM applications without relying on the cloud.

You get a setup that’s:

Private: Everything stays on your machine, which can be important when working with sensitive data.
Free: Since everything happens locally, you’re not paying per API call or usage-minute. This makes it more sustainable for experimentation, development, or long-running tasks.
Powerful: MCP tools for many tasks and services are broadly available, enabling you to extend the capabilities of your LLM with ease.

Shoutout to the Hugging Face team for pioneering Tiny Agents and pushing open tooling forward. Lemonade is proud to build on top of this amazing foundation.

Try It Yourself

Install Lemonade Server, configure a Tiny Agent, and see what you can build. If you run into something interesting, or want to contribute, check out our GitHub, open an issue or send us an email at lemonade@amd.com. We’d love to hear what you're working on.

Try it out, tweak it, or build your own. Your next tiny agent is just a loop away!
👉 github.com/lemonade-sdk/lemonade

Article By

Daniel Holanda Noronha

Jeremy Fowers

Krishna Sivakumar

Victoria Godsoe

AI Developer Enablement Manager

white pearl gradient medium color divider

Related Blogs

View All Blogs

資料中心

商用系統

個人與遊戲

嵌入式產品

資源

加速器

自適應加速器

DPU 加速器

乙太網配接器

工作站

桌上型電腦

筆記型電腦

資源

FPGA 與自適應 SoC

系統模組 (SOM)

技術

開發者資源

評估板與套件

處理器工具

顯示卡工具與應用程式

FPGA 與自適應 SoC 工具

IP 與應用

GPU 加速器工具與應用程式

概述

適用於資料中心與雲端

適用於邊緣與端點-

適用於開發者

行業

行業

行業

行業

Industrias

工作負載

遊戲

系統

技術

資源

EPYC 處理器

Radeon 顯示卡與 AMD 晶片組

FPGA 與自適應 SoC

Alveo 加速器與 Kria SOM

Ryzen 處理器

乙太網配接器

概述

EPYC 處理器

加速器

自適應 SoC、FPGA 和 SOM

顯示卡

概述

依產品排序資源

依類型排序資源

關於我們的合作夥伴

AMD 全球支援

處理器與顯示卡

加速器

FPGA 與自適應 SoC

遊戲與個人運算

自適應和嵌入式運算

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Local Tiny Agents: MCP Agents on Ryzen AI with Lemonade Server

What Are Tiny Agents?

QuickStart: Run a Lemonade-Powered Tiny Agent

Beyond Basic Examples: Stepping up your MCP Game

Why This Matters

Try It Yourself

Article By

Related Blogs