Larger-Than-Ever Single-GPU Quantum Simulation

May 12, 2026

Qibo_blog

As the race toward practical quantum advantage accelerates, the bridge between classical high-performance computing (HPC) and quantum algorithms has never been more critical. Large‑scale quantum simulation, algorithm validation, and hardware characterization all depend on massive memory bandwidth, capacity, and sustained performance.  At AMD, we believe that the next generation of quantum breakthroughs will be powered by open collaboration and massive computational horsepower.

In this blog, we highlight our ongoing synergy with Qibo, an end-to-end open-source framework for quantum computing, and In this blog post, we present the largest single‑GPU quantum simulation achieved on AMD hardware to date, running on Qibo—an exact 35‑qubit state‑vector simulation on AMD Instinct™ MI355X GPUs, setting a new milestone for quantum workflows.

What is Qibo?

Qibo is a versatile, full-stack open-source middleware for quantum computing that spans the complete workflow - from high performance circuit simulation to direct control of experimental quantum hardware. Unlike frameworks that address only one layer of the stack, Qibo provides a unified environment for developing algorithms, benchmarking performance, and validating hardware behavior.

The project is a global collaborative effort, developed and used by leading research institutions including the Technology Innovation Institute (TII), Singapore’s National Quantum Computing Hub (NQCH), the Barcelona Supercomputing Center (BSC), and Italy’s National Quantum Science and Technology Institute (NQSTI) and INFN. Its modular design allows researchers to easily switch between different backends, making it a preferred tool for developing new quantum algorithms and characterizing quantum hardware.

Unleashing Qibojit on the AMD Instinct™ GPUs 

Simulating quantum circuits is an incredibly memory-intensive task: the state vector doubles in size with every additional qubit. As the number of qubits moves beyond 30, performance is dictated less by raw compute and more by HBM capacity, bandwidth, and sustained access efficiency. This is where the AMD Instinct™ MI300X and MI350X Series GPUs become a game-changer.

AMD Instinct GPUs are designed to address precisely these memory‑intensive workloads, making them a natural fit for large‑scale quantum state‑vector simulation. Their high‑bandwidth memory architecture and strong software support through the ROCm™ software stack enable efficient execution of bandwidth‑dominated kernels common in quantum circuit simulation.

MI300X: Establishing the Baseline

The Instinct MI300X GPUs, with 192GB HBM3 capacity and peak memory bandwidth of 5.3 TB/s, enabled single-GPU state-vector simulations up to 34 qubits as demonstrated in our previous work [link]. It also served as the primary platform for early qibojit optimizations on AMD GPUs, validating portability, numerical stability, and performance of the Qibo simulation stack.

MI355X: Pushing Single‑GPU Simulation Boundary Further

The Instinct MI355X GPUs advance single‑GPU quantum simulation further to new frontiers. With its industry-leading 288 GB HBM3E capacity and up to 8 TB/s peak memory bandwidth, delivering the memory capacity, throughput, and FP64/complex arithmetic performance required for high‑fidelity, large‑scale state‑vector simulation. But hardware is only half the equation—software optimization is key.

This brings us to Qibojit, Qibo’s high-performance simulation backend. Consider Figure 1, Qibo maps quantum circuits to optimized GPU execution on AMD Instinct GPUs via Qibojit and ROCm software, while seamlessly bridging simulation and read-hardware control through Qibolab and QICK.

diagram

Figure 1. Qibo Software Stack on AMD Instinct GPUs.

 In our joint work with Qibo, these capabilities translate into:

  • Lower end‑to‑end simulation time compared to the MI300X GPUs for large state‑vector circuits

  • Exact simulation of up to 35 qubits on a single GPU, reducing reliance on multi‑GPU partitioning

  • Stable performance across single‑ and double‑precision, allowing accuracy without a significant performance trade‑off

These gains stem from higher effective memory throughput and tight hardware–software co‑design. Qibojit leverages JIT compilation and optimized kernels with support for AMD GPUs since 2022 via cupy-rocm, ensuring seamless integration with the ROCm™ open software platform. This portability allows the same Qibo workflows to run across the MI300X and the MI355X GPUs with minimal changes, immediately benefiting from generational performance improvements. 

Scaling Performance: Quantum Fourier Transform (QFT)

To evaluate scaling behavior, we benchmarked a Quantum Fourier Transform (QFT) circuit using Qibo with the Qibojit backend as shown in Figure 2. QFT is a canonical workload that stresses both memory bandwidth and kernel efficiency as qubit counts increase.

The results show a highly coherent scaling of performance as we push towards higher qubit counts, leveraging the massive industry leading memory capacity and bandwidth of the  Instinct MI300X and MI355X GPUs, not to mention scaling up to 35 qubits on a single MI355X GPU. Compared to the MI300X, the MI355X GPUs consistently reduce total simulation time across the full qubit range, with the largest gains appearing at higher qubit counts where memory pressure is most severe. Notably:

  • The performance gap between single‑precision (complex64) and double‑precision (complex128) remains small

  • The dominant bottleneck shifts toward memory access rather than floating‑point throughput

  • Researchers can therefore prioritize numerical accuracy without sacrificing performance

graph

Figure 2: Quantum circuit simulation times for a Quantum Fourier Transform using Qibojit on AMD Instinct GPUs. In the large-qubit regime (25 qubits and above), single-precision simulations achieve up to a 1.8x speedup on the MI300X and a 2x speedup on the MI355X when compared to double-precision simulations. For double-precision simulations, the MI355X outperforms the MI300X by up to 2.4x. For comparisons with other GPU models, please refer to Figure 5 of the Qibojit paper [1].

This established track record means that researchers can hit the ground running. Together with qibojit’s efficient instruction sets with sheer throughput of AMD Instinct GPUs, effectively run MI300X and MI355X into a powerful  "virtual quantum computer", capable of simulating quantum systems at a scale that was previously impractical on a single accelerator. 

Try Qibo on AMD Developer Cloud now

To unlock the potential of large-scale quantum simulation with Qibo powered by AMD Instinct™ GPU infrastructure, we use the AMD Developer Cloud on DigitalOcean. With on‑demand access to MI300X and MI350X GPUs, the AMD Developer Cloud lets you quickly deploy Qibo, harness massive HBM capacity and bandwidth, and run memory‑intensive quantum workloads using the ROCm™ software stack—all through a familiar, cloud‑native experience.

Create a GPU Droplet

For those familiar with Digital Ocean, getting started is simple. After registration, a GPU Droplet can be launched using the Create option in the top menu. 

screenshot

Figure 3. Selection Menu to create GPU Droplets

AMD Developer Cloud currently offers MI300X and MI350X in two flavors: 1xGPU and 8xGPUs. Here, in this example, we select an MI300X for demonstration, as shown in Figure 4.

screenshot2

Figure 4. Selection of GPU Droplet type (MI300X or MI350X) with 1GPU or 8GPUs.

We must also select the OS image to boot up the VM. Among the options, we will use the plain AMD ROCm software image, as shown in Figure 5.  

screenshot3

Figure 5. ROCm Software Image.

We will need to add an SSH key for IP terminal sessions. We then create the GPU droplet, which may take a few minutes.

Once the GPU droplet is created, you will see a control panel similar to Figure 6. 

screeenshot

Figure 6. GPU Droplet Control Panel.

Finally, we integrate Qibo with ROCm to run quantum simulations on AMD Instinct GPUs. For simplicity, we use the Web Console, which provides a browser‑based terminal, refer to Figure 7,  to configure the environment. Through this interface, we set up Qibo with ROCm using cupy‑rocm, enabling accelerated quantum state‑vector simulations on AMD Instinct GPUs.

Figure 7. Web-Console based Terminal.
Figure 7. Web-Console based Terminal.

Integrating Qibo for AMD GPUs

With the environment prepared, we now build Qibo using a cupy‑rocm branch currently under review. The steps below outline how to build Qibo with CuPy support on AMD DevCloud.

The following instructions are used to build qibo with CuPy on DevCloud. 

		#set environment variables:
export CUPY_INSTALL_USE_HIP=1
export ROCM_HOME=/opt/rocm
export HCC_AMDGPU_TARGET=gfx942

# set the python virtual environment
sudo apt install python3.12-venv
python3.12 -m venv env
source env/bin/activate

# pull the repo
git clone --recursive https://github.com/ROCm/cupy.git
cd cupy
git switch fix_large_workspace_crashes_ip
# install cupy
python -m pip install .
# move out of the build directory
cd ~/

# install qibojit
python -m pip install qibojit

# set the C/C++ path 
export C_INCLUDE_PATH=/usr/include:/usr/lib/gcc/x86_64-linux-gnu/13/include${C_INCLUDE_PATH:+:$C_INCLUDE_PATH}
 export CPLUS_INCLUDE_PATH=/usr/include/c++/13:/usr/include/x86_64-linux-gnu/c++/13${CPLUS_INCLUDE_PATH:+:$CPLUS_INCLUDE_PATH}
 export CPATH=/usr/include:/usr/lib/gcc/x86_64-linux-gnu/13/include${CPATH:+:$CPATH}


# run sample simulation 
$python
>>> from qibo import models, gates
>>> circuit = models.QFT(30)
>>> results = circuit()
[Qibo 0.3.2|INFO|2026-04-17 21:03:55]: Using qibojit (cupy) backend on /GPU:0
>>> print (results.state())
[3.05175781e-05+0.j 3.05175781e-05+0.j 3.05175781e-05+0.j ...
 3.05175781e-05+0.j 3.05175781e-05+0.j 3.05175781e-05+0.j]

	


Bridging the Gap to Real Hardware: Qibosoq and QICK

AMD’s role in quantum goes beyond GPUs. The entry point in quantum computers is called the Quantum Controller, a piece of hardware that bridges the gap between the digital world of traditional computers and the analog world of quantum. Most quantum computers nowadays implement the Quantum Control unit via AMD FPGAs. An example of this is QICK (Quantum Instrumentation Control Kit), an open-source readout and control platform developed by Fermilab and supported by AMD Zynq™ UltraScale+™ RFSoCs FPGAs such as RFSoC4x2, ZCU216, and ZCU111.

Simulating quantum circuits today prepares today’s applications for a quantum future. However, how can we guarantee that programmers do not have to rewrite their own code once they get access to real quantum systems? Qibo’s capabilities extend beyond simulation into the realm of real-time hardware control, a critical frontier for experimental physics, through the Qibolab module, which transforms quantum circuit representations into pulses and driver instructions for control electronics across multiple vendors. Thus, the same circuits that are created for the simulator can be used to target a real system.

The Qibolab framework includes Qibosoq, a specialized server component that integrates with QICK out of the box. Through Qibosoq, Qibo can communicate directly with AMD FPGA-based controllers to generate pulses and read out qubit states.

22

Figure 8: Schematic overview of Qibo quantum circuit execution orchestrated on quantum hardware through Qibolab and Qibosoq. The original quantum circuit defined by the user is first transpiled and converted into a pulse sequence. Subsequently, the pulse sequence is sent from the Qibosoq client driver in Qibolab to the Qibosoq server running on an AMD Xilinx FPGA RFSoC evaluation board via TCP. Finally, the Qibosoq server prepares and submits the instructions using the QICK firmware. The results are then sent back to the user.

This integration empowers researchers to use a single, unified framework—Qibo—to simulate on an AMD MI300X and then deploy those same instructions to control physical qubits via Qibosoq and AMD FPGAs.

The Future is Open

The collaboration between AMD and Qibo demonstrates the power of open ecosystems: advanced accelerators paired with flexible, community‑driven software and hardware. With the AMD Instinct MI355X GPUs, researchers can now push exact quantum simulation further—both faster and at larger scales—while maintaining full software portability across AMD platforms such as the MI300X GPU. With Qibo, we guarantee portability between current simulators and current and future systems running on AMD FPGAs. Whether you are studying error‑correction codes, benchmarking quantum algorithms, or bridging simulation with real hardware control, Qibo on the Instinct GPUs provides a robust, open foundation for the next wave of quantum innovation.

To learn more about the Qibo project at qibo.science, explore the specifications of the AMD Instinct GPUs, and stay current with AMD’s quantum computing roadmap and updates at Quantum@AMD. We invite you to explore the quantum ecosystem and contribute to the development and deployment of Qibo for applications involving quantum simulation, quantum control and calibration..

Related Blogs