Unlocking High-Performance Document Parsing of PaddleOCR-VL-1.5 on AMD GPUs with ROCm Software

Jan 29, 2026

We’re excited to share that AMD achieves Day 0 support for Baidu’s latest PaddleOCR-VL-1.5 model, successfully running it on AMD Instinct™ MI Series GPUs using the AMD ROCm™ software release 7.0. This milestone delivers Day 0 model services for enterprises and developers worldwide, unlocking high-performance document parsing capabilities by leveraging PaddleOCR-VL-1.5’s advanced features and ROCm’s optimized acceleration.

With this release, user can deploy PaddleOCR-VL-1.5 on AMD GPUs from day one, this provides low-latency, and high-throughput document parsing solutions to support mission-critical workflows, while enabling developers to quickly integrate state-of-the-art OCR capabilities into their applications without lengthy adaptation cycles, significantly reducing development and deployment costs.

In this blog, we present a comprehensive guide to help users seamlessly adopt PaddleOCR-VL -1.5 on AMD GPUs. We will first introduce the core strengths of PaddleOCR-VL-1.5, including its SOTA performance and innovative features. Then, we provide dual deployment options—an easy one-click Jupyter Notebook for quick trials and a step-by-step manual setup via Docker for customized workflows. We also detail the implementation of two inference backends (native PaddlePaddle and vLLM) to cater to different production needs, followed by performance comparisons and detailed configuration references.

Brief Introduction to PaddleOCR-VL-1.5

PaddleOCR-VL-1.5 is the latest iteration of the PaddleOCR-VL series. Building on the fully optimized core capabilities of version 1.0, it achieves 94.5% high accuracy on OmniDocBench v1.5, a leading document parsing benchmark, outperforming top global general-purpose LLMs and specialized document parsing models. As the world’s first document parsing model supporting irregular box localization, it delivers superior performance in real-world scenarios such as scanning, tilting, folding, screen capture, and complex lighting, achieving overall SOTA. Additionally, it integrates seal recognition, text detection, and recognition tasks, with key metrics consistently leading mainstream models. If you want to learn about model architecture.

Key capabilities of PaddleOCR-VL-1.5 as Figure1 said include:

  1. With only 0.9B parameters, it achieves 94.5% accuracy on OmniDocBench v1.5, surpassing the previous SOTA model PaddleOCR-VL. Its table, formula, and text recognition capabilities are significantly enhanced.
  2. The world’s first document parsing model supporting irregular box localization, which can accurately return polygonal detection boxes in tilted and folded scenarios. It outperforms current mainstream open-source and closed-source models in five scenarios: scanning, folding, tilting, screen capture, and lighting changes.
  3. Newly added text line localization/recognition and seal recognition capabilities, with all technical indicators refreshing the SOTA in the field.
  4. Optimized recognition of rare characters, ancient books, multilingual tables, underlines, and checkboxes, and extended support for Tibetan and Bengali recognition.
  5. Supports automatic cross-page table merging and cross-page paragraph title recognition, solving the discontinuity problem in long document parsing.
Figure 1: Leading in OmniDocBench v1.5 and self-built Real5-OmniDocBench
Figure 1: Leading in OmniDocBench v1.5 and self-built Real5-OmniDocBench

Next, let's go for the amazing journey on AMD GPUs! Come and give it a try!

Quickstart: One-click Jupyter Notebook

We prepared a quite easy notebook to get you started PaddleOCR-VL-1.5 model on AMD GPUs. Just click quick start

if you would like to hands-on build the environment and running steps, below sections are all you need.

Environment Setup 

Hardware Requirement 

  • AMD Instinct MI300X GPU (ROCm 7.0 compatible)

Docker Container Deployment 

Start with the pre-configured Docker container that includes all necessary dependencies:

 

		```
docker run -it \
  --device=/dev/kfd \
  --device=/dev/dri \
  --security-opt seccomp=unconfined \
  --network=host \
  --cap-add=SYS_PTRACE \
  --group-add video \
  --shm-size 32g \
  --ipc=host \
  -v $PWD:/workspace \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-paddle-vllm-amd-gpu:3.4.0-0.14.0rc2 \
  /bin/bash
```
	

The container comes pre-installed with:

  • PaddlePaddle (compiled with ROCm support)
  • vLLM (ROCm-optimized version)
  • PaddleX (for OCR pipeline management)
  • PaddleOCR-VL-1.5 model dependencies

Verify the environment after launching the container:

		```
# Check ROCm version
cat /opt/rocm/.info/version

# Verify PaddlePaddle ROCm support
python -c "import paddle; print('Paddle ROCm compiled:', paddle.is_compiled_with_rocm())"

# Check GPU detection
rocm-smi
```
	

Get Started hands-on

Running PaddleOCR-VL with Native Backend on AMD GPU

The native backend uses PaddlePaddle’s built-in inference engine for straightforward deployment.

1. Run inference on your document image:

Download the test image by 

		```
wget -q https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png -O /tmp/test_ocr.png
```
	

2. Change to the PaddleX directory where models are located

		```
cd /opt/PaddleX
```
	

3. Create the inference.yml file required for native backend

		```
cat > checkpoint-5000/inference.yml << 'EOF'
Global:
  model_name: PaddleOCR-VL-1.5-0.9B
EOF
```
	

4. Create the native backend pipeline configuration

		```
cat > PaddleOCR-VL-native.yaml << 'EOF'
pipeline_name: PaddleOCR-VL

batch_size: 64
use_queues: True
use_doc_preprocessor: False
use_layout_detection: True
use_chart_recognition: False
format_block_content: False
merge_layout_blocks: True

SubModules:
  LayoutDetection:
    module_name: layout_detection
    model_name: PP-DocLayoutV3
    model_dir: ./layout_0116
    batch_size: 8
    threshold: 0.3
    layout_nms: True

  VLRecognition:
    module_name: vl_recognition
    model_name: PaddleOCR-VL-1.5-0.9B
    model_dir: ./checkpoint-5000
    batch_size: 4096
    genai_config:
      backend: native
EOF
```
	

5. Then run inference with native backend

		```
paddlex --pipeline PaddleOCR-VL-native.yaml --input /tmp/test_ocr.png
```
	

Running PaddleOCR-VL with vLLM Backend on AMD GPU

The vLLM backend provides significant performance improvements through optimized batching and inference acceleration, recommended for production environments.

1. Verify the server status:

		```
curl http://localhost:8118/v1/models
```
	

Expected response:

		```
{"object":"list","data":[{"id":"PaddleOCR-VL-1.5-0.9B","object":"model"}]}
```
	

2. Execute inference with the vLLM backend

Download the test image by 

		```
wget -q https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png -O /tmp/test_ocr.png
```
	

3. Change to the PaddleX directory where models and config are located

		```
cd /opt/PaddleX
```
	

4. Fix the pipeline_name in the existing vllm yaml (required for CLI compatibility)

		```
sed -i 's/pipeline_name: PaddleOCR-VL-1.5/pipeline_name: PaddleOCR-VL/' PaddleOCR-VL-vllm.yaml
```
	

5. Then run inference with vLLM backend!

		```
paddlex --pipeline PaddleOCR-VL-vllm.yaml --input /tmp/test_ocr.png

```
	

Performance Benchmarking

The vLLM backend is highly recommended for production deployments due to its superior performance, especially when handling large volumes of documents. Both backends maintain PaddleOCR-VL-1.5’s SOTA accuracy while leveraging AMD GPU hardware acceleration.

Backend

Typical Latency per Image

Key Advantages

Native (Paddle)

~2-5 seconds

Simple deployment, no additional server setup

vLLM

~0.5-1 second

Optimized throughput, lower latency for batch processing

Configuration Reference

This section will explain the key config in `PaddleOCR-VL-vllm.yaml` if you’re interested in the yaml file you input.

Key Pipeline Configuration

Parameter

Description

Default Value

pipeline_name

Unique identifier for the pipeline

PaddleOCR-VL-1.5

use_layout_detection

Enable document layout analysis

True

use_doc_preprocessor

Enable preprocessing (e.g., image enhancement)

False

Parameter

Description

Default Value

use_chart_recognition

Enable chart element recognition

False

merge_layout_blocks

Merge adjacent layout elements

True

batch_size

Number of documents processed in parallel

64

 

VLRecognition Module Options

Parameter

Description

model_name

Model identifier (PaddleOCR-VL-1.5-0.9B)

model_dir

Path to pre-trained model weights

genai_config.backend

Inference backend (native or vllm-server)

genai_config.server_url

vLLM server endpoint (for vllm-server backend)

vLLM Server Configuration

Parameter

Description

Default Value

--model_name

Name of the model to serve

Required

--model_dir

Path to model checkpoint directory

Required

--backend

Inference backend (vllm, fastdeploy, sglang)

vllm

--host

Server host address

localhost

--port

Server listening port

8000

With this setup in `PaddleOCR-VL-vllm.yaml` , users can now leverage PaddleOCR-VL-1.5’s powerful document parsing capabilities with the ROCm 7 software on AMD GPUs, enabling efficient and accurate processing of diverse document types across industries.

Model Download Link: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Conclusion

With Day 0 support for PaddleOCR-VL-1.5 on the AMD GPUs, AMD continues to strengthen its commitment to delivering cutting-edge AI capabilities and optimizing the ROCm software ecosystem for developers and enterprises. This integration combines PaddleOCR-VL-1.5’s SOTA document parsing capabilities—from irregular box localization to multilingual support—with AMD Instinct MI300X’s powerful hardware and ROCm 7 optimized acceleration, creating a robust, efficient solution for real-world document processing workflows. Whether for rapid prototyping via the one-click Jupyter Notebook or large-scale production deployment with the vLLM backend, users can fully unlock the potential of both technologies without compromising on accuracy or performance. Moving forward, AMD will remain focused on expanding Day 0 support for leading AI models, empowering users across industries to build innovative, high-performance AI applications with ease.

Related Blogs