Unlock the Power of the IBM Granite 4.0 Family of Models with AMD Instinct GPUs: A Developer’s Day 0 guide

Oct 02, 2025

AMD is excited to announce Day 0 support for IBM’s next generation Granite 4.0 language models on AMD Instinct™ MI300 Series GPUs (300X, 325X) and MI350 Series GPUs (350X, 355X) using vLLM.

This blog explains the architecture highlights, collaboration between AMD and IBM, prerequisites, and provides a quick start so you can run the IBM Granite 4.0 on AMD GPUs.

Brief Introduction to Granite 4.0 Language Models

Granite 4.0 models utilize a new hybrid Mamba-2/Transformer architecture, marrying the speed and efficiency of Mamba with the precision of transformer-based self-attention.

Many of the innovations informing the Granite 4 architecture arose from IBM Research’s collaboration with the original Mamba creators on Bamba.

The Granite 4.0 Mixture of Experts (MoEs) architecture employs 9 Mamba blocks for every 1 transformer block.
The Mamba blocks capture global context, which is then passed to transformer blocks that enable a more nuanced parsing of local context. The result is a dramatic reduction in memory usage and latency with no apparent tradeoff in performance.

AMD and IBM Collaboration: Day 0 Support and Beyond

AMD has longstanding collaborations with IBM and Red Hat. Together we continue to push the boundaries of AI performance. Thanks to this close relationship, Granite 4.0 can run seamlessly on AMD Instinct GPUs from Day 0, using PyTorch and vLLM. Our collaboration paves the way for even more groundbreaking innovations, ensuring that AI performance continues to evolve and meet the increasing demands of modern computing.

Running Granite 4.0 on AMD Instinct GPUs

Prerequisites:

You have an AMD Instinct MI300X or above GPU
You have AMD ROCm™ drivers installed

This guide provides a step-by-step guide for running Granite 4.0 with our custom prebuilt docker. For running bare metal, Grab the tip of tree of vLLM’s GitHub repository as everything has been fully upstreamed.

Step 1: Get the Granite 4.0 docker

We have created a public preview docker for Granite 4.0 which you can pull here:

docker pull rocm/vllm-dev:granite_4_preview

Step 2: Download Granite 4.0

Download a Granite 4.0 model through Hugging face: Granite models

Step 3: Launch the docker container

    docker run \ 
    --rm \ 
    --device=/dev/kfd \ 
    --device=/dev/dri \ 
    --group-add video \ 
    --memory $(python3 -c "import os; mlim = int(0.8 * os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / 10**9); print(f'{mlim}G')") \ 
    --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged \ 
    --shm-size=16g \ 
    --ulimit core=0:0 \ 
    -e "TERM=xterm-256color" \ 
    --name "granite_4_vllm_rocm" \ 
    -it rocm/vllm-dev:granite_4_preview \ 
    /bin/bash

Demo Granite 4.0 on AMD Instinct GPUs

Given the docker and a Granite 4.0 model, run simple prompts on the model:

    # SPDX-License-Identifier: Apache-2.0 
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project 

from vllm import LLM, SamplingParams 

# Sample prompts. 
prompts = [ 
    "Hello, my name is", 
    "The president of the United States is", 
    "The capital of France is", 
    "The future of AI is", 
] 
# Create a sampling params object. 
sampling_params = SamplingParams(temperature=0.8, top_p=0.95) 


def main(): 
    # Create an LLM. 
    llm = LLM(model="ibm-granite/granite-4.0-micro") 
    # Generate texts from the prompts. 
    # The output is a list of RequestOutput objects 
    # that contain the prompt, generated text, and other information. 
    outputs = llm.generate(prompts, sampling_params) 
    # Print the outputs. 
    print("\nGenerated Outputs:\n" + "-" * 60) 
    for output in outputs: 
        prompt = output.prompt 
        generated_text = output.outputs[0].text 
        print(f"Prompt:    {prompt!r}") 
        print(f"Output:    {generated_text!r}") 
        print("-" * 60) 

if __name__ == "__main__": 
    main()

Before running this script, configure your HuggingFace access token properly following this tutorial and export it as an environmental variable:

export HF_TOKEN=[your huggingface access token here]

    Sample output from ibm-granite/granite-4.0-micro: 
Generated Outputs: 
------------------------------------------------------------ 
Prompt:    'Hello, my name is' 
Output:    ' Helen and I am from Boston. I am a senior manager at a technology firm' 
------------------------------------------------------------ 
Prompt:    'The president of the United States is' 
Output:    ' an interesting case in point. He is the head of the executive branch, which' 
------------------------------------------------------------ 
Prompt:    'The capital of France is' 
Output:    ' Paris.' 
------------------------------------------------------------ 
Prompt:    'The future of AI is' 
Output:    ' promising and will bring many changes to the world. As AI continues to develop,' 
------------------------------------------------------------

Summary

This blog provides a step-by-step Day 0 guide to run IBM Granite 4.0 models on AMD Instinct MI300 and MI350 Series GPUs. With Granite 4.0 running seamlessly on AMD Instinct GPUs, developers can immediately build and scale AI applications such as document summarization and analysis, RAG, and AI agents while maintaining transparency, safety, and security with an ISO 42001 certified LLM. This milestone is part of our broader mission to support open, high-performance AI tooling. This collaboration drives innovation, providing the AI community with high-performance, open-source tools.

Acknowledgements

AMD team members who contributed to this effort: Aleksandr Malyshev, Gregory Shtrasberg, and Matthew Wong.
This work would not have been possible without the close collaboration and support of the various organizations inside IBM. There are too many folks involved to name them all, but special thanks to Raghu Ganti for his leadership on this collaboration.

Article By

Joe Shajrawi

Charlie Fu

Andy Luo

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Unlock the Power of the IBM Granite 4.0 Family of Models with AMD Instinct GPUs: A Developer’s Day 0 guide

Brief Introduction to Granite 4.0 Language Models

AMD and IBM Collaboration: Day 0 Support and Beyond

Running Granite 4.0 on AMD Instinct GPUs

Step 1: Get the Granite 4.0 docker

Step 2: Download Granite 4.0

Step 3: Launch the docker container

Demo Granite 4.0 on AMD Instinct GPUs

Summary

Acknowledgements

Article By

Related Blogs