[How-To] Automatic1111 Stable Diffusion WebUI with DirectML Extension on AMD GPUs

Nov 30, 2023

Prepared byHisham Chowdhury (AMD),Sonbol Yazdanbakhsh (AMD), Justin Stoecker (Microsoft), and Anirban Roy (Microsoft)

Microsoft and AMD continue to collaborate enabling and accelerating AI workloads across AMD GPUs on Windows platforms. We published an earlier article about accelerating Stable Diffusion on AMD GPUs using Automatic1111 DirectML fork.

Now we are happy to share that with ‘Automatic1111 DirectML extension’ preview from Microsoft, you can run Stable Diffusion 1.5 with base Automatic1111 with similar upside across AMD GPUs mentioned in our previous post

adit_bhutani_0-1701375817434.png

Fig 1: up to 12X faster Inference on AMD Radeon™ RX 7900 XTX GPUs compared to non ONNXruntime default Automatic1111 path

Microsoft and AMD engineering teams worked closely to optimize Stable Diffusion to run on AMD GPUs accelerated via Microsoft DirectML platform API and AMD device drivers. AMD device driver resident ML acceleration layers utilize AMD Matrix Processing Cores via wavemma intrinsics to accelerate DirectML based ML workloads including Stable Diffusion, Llama2 and others.

adit_bhutani_0-1701368649041.png

Fig 2:OnnxRuntime-DirectML on AMD GPUs

1.Prerequisites

Installed Git (Git for Windows)
Installed Anaconda/Miniconda (Miniconda for Windows)
- Ensure Anaconda/Miniconda directory is added to PATH
Platform having AMD Graphics Processing Units (GPU)
- Driver: AMD Software: Adrenalin Edition™ 23.11.1 or newer(https://www.amd.com/en/support)

2. Overview of Microsoft Olive

Olive is a Python tool that can be used to convert, optimize, quantize, and auto-tune models for optimal inference performance with ONNX Runtime execution providers like DirectML. Olive greatly simplifies model processing by providing a single toolchain to compose optimization techniques, which is especially important with more complex models like Stable Diffusion that are sensitive to the ordering of optimization techniques. The DirectML sample for Stable Diffusion applies the following techniques:

Model conversion:translates the base models from PyTorch to ONNX.
Transformer graph optimization:fuses subgraphs into multi-head attention operators and eliminating inefficient from conversion.
Quantization:converts most layers from FP32 to FP16 to reduce the model's GPU memory footprint and improve performance.

Combined, the above optimizations enable DirectML to leverage AMD GPUs for greatly improved performance when performing inference with transformer models like Stable Diffusion.

3. Automatic1111 WebUI DirectML Extension(Preview)

Follow these steps to enable DirectML extension on Automatic1111 WebUI and run with Olive optimized models on your AMD GPUs:

**only Stable Diffusion 1.5 is supported with this extension currently

**generate Olive optimized models using our previous post or Microsoft Olive instructions when using the DirectML extension

**not tested with multiple extensions enabled at the same time

Open Anaconda Terminal

conda create --name automatic_dmlplugin python=3.10.6
conda activate automatic_dmlplugin
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
webui.bat --lowvram --precision full --no-half --skip-torch-cuda-test

Open the Extensions tab

go to Install from URL and paste in this URL: https://github.com/microsoft/Stable-Diffusion-WebUI-DirectML
Click ‘install’

Copy the Unet model optimized by Olive to models\Unet-dml folder

example \models\optimized\runwayml\stable-diffusion-v1-5\unet\model.onnx -> stable-diffusion-webui\models\Unet-dml\model.onnx folder.

Return to the Settings Menu on the WebUI interface

Settings → User Interface → Quick Settings List, add sd_unet
Apply settings, Reload UI

adit_bhutani_2-1701369387353.png

Navigate to the "Txt2img" tab of the WebUI Interface

Select the DML Unet model from the sd_unet dropdown

adit_bhutani_3-1701369487042.png

Run your inference!

adit_bhutani_4-1701369518907.png

Result is up to 12X faster Inference on AMD Radeon™ RX 7900 XTX GPUs compared to non-Olive-ONNXRuntime default Automatic1111 path.

Article By

Adit Bhutani

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

[How-To] Automatic1111 Stable Diffusion WebUI with DirectML Extension on AMD GPUs

Article By

Related Blogs