Harnessing Dify + Local LLMs on AMD Ryzen AI PCs for Private Workflows

Oct 06, 2025

Dify Chatbot Workflow

Introduction

The use of large language models (LLMs) is increasingly integrated into our everyday work. With tools like Dify and Lemonade, developers can quickly build and deploy AI applications using an intuitive, node-based interface. Lemonade Server enables efficient execution of LLMs locally on AMD Ryzen™ AI PCs, taking advantage of built-in acceleration for faster performance. Paired with Dify, you can visually orchestrate AI workflows without needing deep machine learning expertise.

In this blog, we’ll show the following example using Dify:

  • Ask My Docs: Upload reference materials like FAQs or README files in markdown format, and Dify will automatically index them. The indexed content becomes contextual input for the language model, enabling document-aware responses.

This example highlights how Dify, when paired with Ryzen AI PCs running Lemonade Server, enables powerful local LLM applications. This is a simple example to get started with, but Dify can be extended to support more complex and varied AI applications.

What Is Dify?

Dify is an open-source platform designed to make building AI applications powered by large language models easy. It lets you design workflows visually with nodes (inputs, retrieval, agents, tools) and swap models without rewriting code.

Key features:

  • Visual Workflow Builder: Drag-and-drop interface for designing AI pipelines.
  • Knowledge Base Integration: Ingest and index documents to provide contextual grounding for LLMs.
  • Built-in Connectors & API Access: Easily integrate external tools and services.
  • Flexible Deployment: Supports both self-hosted and remote LLM endpoints.

What is Lemonade?

Lemonade is a client-side inference framework for Windows and Linux that simplifies LLM deployment using NPU and GPU acceleration. It supports models like Qwen, Llama, and DeepSeek and includes support for different hardware backends. By running models locally, Lemonade enhances data privacy and security, keeping sensitive information on your device while leveraging hardware acceleration for high performance. Lemonade Server offers a local runtime with an API compatible with OpenAI, making it easy to integrate local LLMs into existing applications.

Dify integrates with Lemonade Server to enable LLM inference, text embedding, and reranking — making it easy to build private, high-performance AI workflows .

Getting Started with Dify

The following instructions are a simplified version of the official guide for enabling Lemonade as a custom model provider in Dify. For full setup details, refer to the documentation here

Prerequisites:

First, make sure you have Docker Desktop installed — it includes both Docker and Docker Compose, which are required to run Dify.

Next, download and install Lemonade Server, which will serve as your local model provider.

Install Dify

1. Clone the Dify repository:

    git clone https://github.com/langgenius/dify.git

2. Navigate to the Docker setup folder

    cd dify/docker

3. Copy the environment configuration file

    cp .env.example .env

4. Start the services using Docker Compose

    docker compose up -d

Launch Dify

Open your preferred browser and navigate to: http://localhost/plugins?category=discover

From there, search for and install Lemonade as a model provider.

Add Models to Lemonade in Dify

  1. Go to: Settings > Model Providers > Lemonade > Add a Model.
  2. Fill in the following fields:

Field

Value / Description

Model Name

Ex. Llama-3.2-3B-Instruct-Hybrid

(To find more models, use Lemonade Server’s Model Manager. Note: Models must be downloaded before adding them to Dify)

Model Type

LLM

Authorization Name

(Leave Blank)

API endpoint URL

http://host.docker.internal:8000

Model context size

2048

(Increase if you need a larger context window)

Agent Thought

Select "Support" if your model supports reasoning chains. (Look for models labeled “Reasoning” in Lemonade Model Manager.)

Vision Support

Select "Support" if your model supports image understanding. (Look for models labeled “Vision” in Lemonade Model Manager.)

Repeat this process for each model you want to add to Dify.

Example Workflow: Ask My Docs

We created a workflow that allows querying documentation — such as guides and FAQs — in Markdown format. Dify supports a wide range of file types including .txt, .md, .pdf, .html, .xlsx, .docx, .csv, and more.

Here’s how it works:

1. Upload or sync your documents: Add your reference files as a Dify Knowledge source. For this example, we used the Readme.md and FAQ.md from the Lemonade-SDK repository. These documents serve as the context for the LLM.

Steps to create a Dify Knowledge Source

2. Create Chatbot Workflow: Create the workflow for a chatbot using the built in Chatflow. This will be what we update with our knowledge source and local LLM.

Create a Dify Workflow

3. Add a Knowledge Retrieval node: In your workflow, insert a Knowledge Retrieval node and link it to the dataset created in step 1.

Insert the Knowledge Retrieval Node

4. Choose your local model: Choose a locally hosted model — for instance, an ONNX-converted model running on an AMD GPU/NPU. We used Qwen2.5-7B-Instruct-Hybrid, which optimizes performance by using the NPU for the prefill phase and the GPU for token generation.

5. Configure the System Prompt: Provide a system prompt to guide the model’s behavior. Example: “Only use the provided information {context}. Be clear, concise, and friendly. Do not fabricate answers or respond to questions outside the scope of the context.”

Select the model to use

6. Map the Workflow: Connect the nodes: Input → Retrieval → LLM → Output. This ensures that queries return accurate answers with citations to the source documents.

Dify Chatbot Workflow

Because the model runs entirely on your local machine, no data leaves your environment. You can ask questions about Lemonade, and the LLM will respond with answers and cite the specific document used for context.

Try out the chatbot with a real prompt

Putting It All Together

With Dify, you don’t need to be a machine learning expert to build powerful local AI workflows. The process is simple:

  1. Select your model endpoint: Point to a locally hosted model via Lemonade Server.
  2. Create datasets: Upload documents, chat logs, or other reference materials.
  3. Build your flow: Connect Input → Retrieval → LLM → Output using Dify’s visual workflow editor.
  4. Automate updates: Use the REST API from Dify to keep datasets in sync with live data sources.

Everything runs privately on your AMD-powered PC, ensuring full control over performance and data security.

Closing Thoughts

By combining Dify’s node-based orchestration with the local acceleration of AMD Ryzen AI and Radeon GPUs, you unlock a secure, high-performance path to generative AI inside your organization. Whether you're:

  • Searching internal knowledge bases
  • Summarizing chat conversations
  • Automating customer support

…you can build it visually, run it locally, and keep your data entirely under your control.

Thanks for learning about Dify and Lemonade Server. If you have any questions or feedback, ou can reach out to us at lemonade@amd.com.

Until next time, you can:

Share:

Article By


AI Developer Enablement Manager 

Related Blogs