



# AMD Versal™ AI Engines for DSP End-to-End Design Flow

# Agenda

---

1. Creating a Vivado Extensible Platform
2. Creating an AI Engine Component in Vitis™ Unified IDE
3. Reviewing the Source Files
4. Configuring DSP Library Parameters Using .csv File
5. Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput
6. Running Export to Vivado™ Tool Flow

# Creating a Vivado Extensible Platform

- Click the Vivado Design Suite icon (  ) from the taskbar
- Click **Create Project** and click **Next** on New Project window, enter project details and click on **Next**



# Creating a Vivado Extensible Platform

- Select the required **Project Type** and click on Next
- Now Add Sources and Constraints if required
- Choose a board required for the project and click Next
- Review the project summary and click on Finish



# Agenda

---

1. Creating a Vivado Extensible Platform
2. **Creating an AI Engine Component in Vitis™ Unified IDE**
3. Reviewing the Source Files
4. Configuring DSP Library Parameters Using .csv File
5. Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput
6. Running Export to Vivado™ Tool Flow

# Creating an AMD Vitis Unified IDE Project

- Click the Vitis™ Unified IDE icon (  ) from the taskbar
- Click **Set Workspace** to choose a workspace directory
- Click Open to proceed



# Creating an AI Engine Component

- File > New Component > AI Engine
- Component name (e.g., filter\_design\_acc)
- Choose Component location (default is workspace)

- Click Add Folders under Import Sources
- Select data and src directories



# Creating an AI Engine Component

- Choose the required XSA
- Click Next
- Review Summary
- Click Finish
- Component appears in Components Window



# Agenda

---

1. Creating a Vivado Extensible Platform
2. Creating an AI Engine Component in Vitis™ Unified IDE
3. **Reviewing the Source Files**
4. Configuring DSP Library Parameters Using .csv File
5. Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput
6. Running Export to Vivado™ Tool Flow

# Reviewing the Source Files

- Open graph code
  - Path: *[AI Engine] > Sources > src > fir1\_graph.h*
- For the FIR filter, the following parameters are declared:
  - TT\_DATA: Data type
  - TT\_COEFF: Coefficient type
  - TP\_FIR\_LEN: Length of the filter (overall length including zeros)
  - TP\_SHIFT: Shifting value operated just before sending the data to the output
  - TP\_RND: Truncation, rounding, rounding up, etc.
  - TP\_INPUT\_WINDOW\_VSIZE: Size of the window (buffer) in samples
  - TP\_CASC\_LEN: Number of cascaded kernels to use for the FIR
  - TP\_DUAL\_IP: Use dual inputs if set to 1
  - TP\_USE\_COEFF\_RELOAD: No reload if set to 0
  - TP\_NUM\_OUTPUTS: Number of ports to broadcast the output
  - TP\_API: Set for windows (I/O buffers) or stream data APIs
  - TP\_SSR: Scale throughput using parallelization

```

#pragma once

#include <adf.h>
#include <fir_sr_sym_graph.hpp>
#include <fft_ifft_dit_lch_graph.hpp>
#include "fir1_coeff.h"

using namespace adf;

using namespace xf::dsp::aie::fir::sr_sym;
using namespace xf::dsp::aie::fft::dit_lch;

// -----
// FIR Namespace
// -----
```

**FIR filter parameters**

```

namespace fir1 {
    static constexpr int WIN_SIZE = 2048;
    typedef cint16 TT_DATA;
    typedef int16 TT_COEFF;
    static constexpr int TP_FIR_LEN = 2048; // Total number of taps
    static constexpr int TP_SHIFT = 0; // Depends on FXP properties
    static constexpr int TP_RND = 0; // how to set this
    static constexpr int TP_INPUT_WINDOW_VSIZE = 2048;
    static constexpr int TP_CASC_LEN = 1; // # of tiles in a cascade
    static constexpr int TP_DUAL_IP = 0; // Only used in stream mode (TP API = 1)
    static constexpr int TP_USE_COEFF_RELOAD = 0; // Set to 1 to use coefficient reload feature
    static constexpr int TP_NUM_OUTPUTS = 1; // Add optional 2nd output port for windows
    static constexpr int TP_API = 0; // Set to 1 for stream mode (windows otherwise)
    static constexpr int TP_SSR = 1; // Scale throughput using parallelization
}
```

**FFT design parameters**

```

// -----
// FFT Namespace
// -----
```

```

namespace fft1 {
    typedef cint16 TT_TYPE;
    typedef cint16 TT_TWIDDLE;
    static constexpr int TP_POINT_SIZE = 1;
    static constexpr int TP_FFT_NIFFT = 1;
    static constexpr int TP_SHIFT = 0; // Excludes twiddle shift
    static constexpr int TP_CASC_LEN = 1;
    static constexpr int TP_DYN_PT_SIZE = 1;
    static constexpr int TP_WINDOW_SIZE = 1;
    static constexpr int TP_API = 0; // Windows
    static constexpr int TP_PARALLEL_POWER = 1;
}
```

# Reviewing the Source Files

For the FFT, the following parameters are declared:

- TT\_TYPE: Type of individual data samples input to and output from the transform function
- TT\_TWIDDLE: Twiddle factors of the transform
- TP\_POINT\_SIZE: Number of samples processed by the FFT
- TP\_FFT\_NIFFT: To select the transform to perform (0 for IFFT and 1 for FFT)
- TP\_SHIFT: Number of bits to shift accumulate down by before output
- TP\_CASC\_LEN: Number of bits to shift accumulate down by before output
- TP\_DYN\_PT\_SIZE: Number of bits to shift accumulate down by before output
- TP\_WINDOW\_SIZE: Number of samples in the input window
- TP\_API: Set for windows (I/O buffers) or stream data APIs
- TP\_PARALLEL\_POWER: Selects the parallelism factor as a power of 2. Values range from 0 to 4

```

#pragma once

#include <adf.h>
#include <fir_sr_sym_graph.hpp>
#include <fft_ifft_dit_lch_graph.hpp>
#include "fir1_coeff.h"

using namespace adf;

using namespace xf::dsp::aie::fir::sr_sym;
using namespace xf::dsp::aie::fft::dit_lch;

// -----
// FIR Namespace
// -----
```

```

namespace fir1 {
    static constexpr int WIN_SIZE = 2048;
    typedef cint16 TT_DATA;
    typedef int16 TT_COEFF;
    static constexpr int TP_FIR_LEN = 2048; // Total number of taps
    static constexpr int TP_SHIFT = 10; // Depends on FXP properties
    static constexpr int TP_RND = 0; // how to set this
    static constexpr int TP_INPUT_WINDOW_VSIZE = 1024; // # of tiles in a cascade
    static constexpr int TP_CASC_LEN = 1024; // Only used in stream mode (TP API = 1)
    static constexpr int TP_DUAL_IP = 0; // Set to 1 to use coefficient reload feature
    static constexpr int TP_USE_COEFF_RELOAD = 0; // Add optional 2nd output port for windows
    static constexpr int TP_NUM_OUTPUTS = 1; // Set to 1 for stream mode (windows otherwise)
    static constexpr int TP_API = 1; // Scale throughput using parallelization
    static constexpr int TP_SSR = 1;
}
```

```

// -----
// FFT Namespace
// -----
```

```

namespace fft1 {
    typedef cint16 TT_TYPE;
    typedef cint16 TT_TWIDDLE;
    static constexpr int TP_POINT_SIZE = 1024;
    static constexpr int TP_FFT_NIFFT = 0; // Excludes twiddle shift
    static constexpr int TP_SHIFT = 10;
    static constexpr int TP_CASC_LEN = 1024;
    static constexpr int TP_DYN_PT_SIZE = 1024;
    static constexpr int TP_WINDOW_SIZE = 1024;
    static constexpr int TP_API = 1; // Windows
    static constexpr int TP_PARALLEL_POWER = 4;
}
```

FIR filter parameters

FFT design parameters

# Reviewing the Source Files

- Review the other components
- Update the kernel connections in fir1\_graph.h as per your design
  - filt\_i.out [0] > fir\_dut.in [0]
  - fir\_dut.out [0] > fft\_dut.in [0]
  - fir\_dut.out [0] > filt\_o.in [0]
  - fft\_dut.out [0] > fft\_o.in [0]

```
class fir1_graph : public graph {
public:

private:
    // Filter taps:
    std::vector<fir1::TT_COEFF> m_taps = std::vector<fir1::TT_COEFF>(FIR1_COEFF);

    // Filter class:
    using TT_FIR = fir_sr_sym_graph<fir1::TT_DATA,fir1::TT_COEFF,fir1::TP_FIR_LEN,fir1::TP_SHIFT,
                           fir1::TP_RND,fir1::TP_INPUT_WINDOW_VSIZE,fir1::TP_CASC_LEN,
                           fir1::TP_DUAL_IP,fir1::TP_USE_COEFF_RELOAD,fir1::TP_NUM_OUTPUTS,
                           fir1::TP_API,fir1::TP_SSR>;

    // FFT class:
    using TT_FFT = fft_ifft_dit_1ch_graph<fft1::TT_TYPE,fft1::TT_TWIDDLE,fft1::TP_POINT_SIZE,
                           fft1::TP_FFT_NIFFT,fft1::TP_SHIFT,fft1::TP_CASC_LEN,
                           fft1::TP_DYN_PT_SIZE,fft1::TP_WINDOW_SIZE,fft1::TP_API,
                           fft1::TP_PARALLEL_POWER>;

public:
    input_plio filt_i; // top level input
    output_plio filt_o; // top level output
    output_plio fft_o; // fft output

    TT_FIR fir_dut; // for fir function
    TT_FFT fft_dut; // for fft function

    fir1_graph(void) : fir_dut( m_taps ), fft_dut()
    {
        filt_i = input_plio::create("PLIO_fir_i",plio_64_bits,"data/sig_i.txt");
        filt_o = output_plio::create("PLIO_fir_o",plio_64_bits,"data/fir_o.txt");
        fft_o = output_plio::create("PLIO_fft_o",plio_64_bits,"data/fft_o.txt");
        connect<>(); //connect the top level input to fir
        connect<>(); //connect the fir output to fft input
        connect<>(); //connect the fir output to top level fir output
        connect<>(); //connect the fft output to top level fft output
    }
}
```

# Agenda

---

1. Creating a Vivado Extensible Platform
2. Creating an AI Engine Component in Vitis™ Unified IDE
3. Reviewing the Source Files
4. **Configuring DSP Library Parameters Using .csv File**
5. Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput
6. Running Export to Vivado™ Tool Flow

# How to Download / Import Libraries from GitHub

## AMD Vitis™ DSP Library

[https://github.com/Xilinx/Vitis\\_Libraries](https://github.com/Xilinx/Vitis_Libraries)



# Configuration Parameters

- Different set of parameters for different functions
- Available in [AMD Vitis™ Libraries Configuration Parameters](#)



The screenshot shows a documentation page for the Vitis Libraries, specifically for the FFT Window Configuration Parameters. The page includes a sidebar with a search bar and a list of configuration parameters, and the main content area displays a table of these parameters.

**FFT Window Configuration Parameters**

For the FFT Window library element, use the following list of configurable parameters and default values.

**Table 97 FFT Window Configuration Parameters**

| Name          | Type     | Default | Description                                              |
|---------------|----------|---------|----------------------------------------------------------|
| DATA_TYPE     | typename | cint16  | Data Type.                                               |
| COEFF_TYPE    | typename | cint16  | Coeff Type.                                              |
| POINT_SIZE    | unsigned | 1024    | FFT point size.                                          |
| SHIFT         | unsigned | 17      | See <a href="#">Common Configuration Parameters</a>      |
| WINDOW_VSIZE  | unsigned | 1024    | Input/Output window size.                                |
|               |          |         | By default, set to: \${POINT_SIZE}.                      |
| DYN_PT_SIZE   | unsigned | 0       | Enable (1) Dynamic Point size feature.                   |
| API_IO        | unsigned | 0       | Graph's port API.                                        |
|               |          |         | 0: window                                                |
|               |          |         | 1: stream                                                |
| WINDOW_CHOICE | unsigned | 0       | Supported types:<br>0: Hamming<br>1: Hann<br>2: Blackman |

# Configuration Databases

## AMD Vitis™ DSP Library – Configuration Database

- Gives an early estimate of power, latency, throughput, and resource utilization
- Benchmarks are for different combination of Library parameters such as datatypes and AI Engine types
- Has ~3K to 5K test results
- Is accessed as a CSV file located at:  
[Vitis\\_Libraries/dsp/docs/src/csv\\_data\\_files/L2 at 2025.1 · Xilinx/Vitis\\_Libraries · GitHub](https://github.com/Xilinx/Vitis_Libraries/tree/2025.1/dsp/docs/src/csv_data_files/L2)



# Filtering and Sorting CSV table

## FFT Window Function – Filtering the values in the CSV table using the Excel sheet in Desktop

The image shows a comparison between the GitHub interface and Microsoft Excel.

**GitHub Interface (Left):**

- Shows a repository: `Vitis_Libraries / dsp / docs / src / csv_data_files / L2 / fft_window_benchmark.csv`.
- Shows a preview of the CSV data with columns: Library, AIE\_VARIANT, TT\_DATA, TT\_COEFF, TP\_POINT\_SIZE, TP\_WINDOW\_VSIZE, TP\_DYN\_PT\_SIZE, TP\_SSR, TI.
- A red box highlights the "Raw" button in the preview toolbar.

**Microsoft Excel (Right):**

- The "Data" tab is selected in the ribbon.
- Red box 1 highlights the "Filter" icon in the ribbon.
- Red box 2 highlights the "Sort & Filter" dropdown in the ribbon.
- Red box 3 highlights the "TP\_POINT\_SIZE" column header in the Excel table.
- A red box highlights the "Number Filters" dropdown in the filter menu.
- The filter menu shows various options like Equals..., Does Not Equal..., Greater Than..., etc.
- The table data includes rows for TP\_POINT\_SIZE values such as 16, 32, 64, 128, 256, 512, 1024, and 4096.

# Agenda

1. Creating a Vivado Extensible Platform
2. Creating an AI Engine Component in Vitis™ Unified IDE
3. Reviewing the Source Files
4. Configuring DSP Library Parameters Using .csv File
5. **Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput**
6. Running Export to Vivado™ Tool Flow

# Building and Simulating the Project

- Click Build under AIE Simulator/Hardware to build the component
- Add Launch Configuration
  - Path: Settings > launch.json
  - Select filter\_design\_acc\_aiesim\_1 config
- Click Run under AIE Simulator/Hardware
- Monitor TASK window



# Latency and Throughput Estimates

Average Throughput Computed

Displayed at End of AIE Simulation

VCD file generates latency and throughput estimates

```
aiesimulator -pkg-dir=./Work -dump-vcd foo -options-file=aiesim-options.txt
```

**Continuous Latency**

Between specific input and output ports

**Continuous Throughput**

At a specific input or output port

Vitis™ Unified IDE > AIE Simulation Reports > Trace > Opens Analysis View

---

**Command line:** vitis\_analyzer aie/aiesimulator\_output/default.aierun\_summary

# Latency & Throughput Table



| First Latency                                    | Last Latency                                   | Average Latency                                                       |
|--------------------------------------------------|------------------------------------------------|-----------------------------------------------------------------------|
| Latency time between first input to first output | Latency time between last input to last output | Difference between avg. output sample time and avg. input sample time |

## Plot or Export Continuous Latency and Throughput

| NAME                                   | FIRST LATENCY (PS) | LAST LATENCY (PS) | AVERAGE LATENCY (PS) |
|----------------------------------------|--------------------|-------------------|----------------------|
| Output: PLIO_Out0 (./data/soutput.txt) | 20549600           | 50151200          | 39885777             |
| Input: PLIO_In0 (./data/Sinput.txt)    | 20549600           | 50151200          | 39885777             |

| NAME                                       | TYPE | DATA WIDTH | FREQUENCY (MHZ) | THROUGHPUT (MBYTES/S) | BUFFERS | CONNECTED PORTS | COLUMN | CHANNEL ID | LOCATION CONSTRAINT | PACKET IDS |
|--------------------------------------------|------|------------|-----------------|-----------------------|---------|-----------------|--------|------------|---------------------|------------|
| fft_design (2)                             |      |            |                 |                       |         |                 |        |            |                     |            |
| Input: In (data/data_fft2048/input.txt)    | PLIO | 32         | 312.5           | 1250.076299           | 2       | 1               | 25     | 0          |                     |            |
| Output: Out (data/data_fft2048/output.txt) | PLIO | 32         | 312.5           | 1250.152607           | 2       | 1               | 25     | 0          |                     |            |

# Agenda

---

1. Creating a Vivado Extensible Platform
2. Creating an AI Engine Component in Vitis™ Unified IDE
3. Reviewing the Source Files
4. Configuring DSP Library Parameters Using .csv File
5. Running AI Engine Compiler and Simulator, Vitis Analyzer to Measure Latency and Throughput
6. **Running Export to Vivado™ Tool Flow**

# AMD Vitis Export to Vivado Flow

Enables bi-directional hardware hand-offs between Vivado™ Design Suite and Vitis™ tools

`v++ -link --export_archive`

Create custom Vivado platform (Flat/BDC) using RTL, HLS, or IP catalog

Flow can be repeated for any number of design iterations

XSA file can be exported from Vivado IDE and passed back to Vitis tools

Updated VMA file can be reimported into Vivado IDE

`v++` compiler operates on Vivado project that has been encapsulated in extensible XSA

Support for modifications to Vivado project that do not invalidate contract between imported design and XCLBIN

# Vitis Export Flow Implementation

1. Import XSA in Vitis™ to compile and link:
  - AI Engine graph (libadf.a)
  - PL kernels (.xo)
  - Update system.cfg, run Vitis linker
2. Export Vitis Metadata Archive (VMA)
   
**`v++ --link --export_archive --platform <xsa> -`**  
**`-config system.cfg <xo> libadf.a -o`**  
**`<vma>.vma`**



# AMD Vitis Export Flow Implementation

3. Import VMA into Vivado™ tools:
  - **vitis::import\_archive ./<vma>.vma**
  - Creates Vitis region block design (read-only Vitis hierarchy)
4. Modify design in Vivado as needed
  - For PL/AIE updates → remove VMA (**vitis::remove\_archive**)
  - Re-export XSA and repeat Vitis™ tool flow if required
5. After implementation, generate fixed XSA:
  - **write\_hw\_platform -fixed ./<fixed\_xsa>.xsa**
6. Use fixed XSA for:
  - Yocto™ / Vitis Embedded apps
  - Bare-metal or hardware validation
7. For emulation:
  - Generate sim-included XSA (include\_sim\_content)
8. Package and create deployable .xclbin:
  - **v++ --package -t <hw|hw\_emu> --xsa <fixed\_xsa>**



**In this demo, we'll walk you through the complete end-to-end flow of developing a Vitis Subsystem, or VSS, and integrating it with a custom Vivado extensible platform — targeting the AMD Versal VCK190 device.**

**Watch the video on YouTube - AMD Versal™ AI Engines for DSP End-to-End Design Flow**

<https://youtu.be/J6NIZHCKqlg>

# Summary

1

Source AMD Vitis™ tools and configure  
Linux® sysroot

2

Use Makefiles to compile AI Engine, HLS,  
RTL, and validate functionality

3

Configure core blocks and export XSA for  
Vitis integration

4

Link components in Vitis, generate VMA,  
and import back into the Vivado™ for final  
design

# General Disclaimer and Attribution Statement

The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18u.

©2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Versal, Vitis, Vivado, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. Yocto Project is a trademark of The Linux Foundation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners. Certain AMD technologies may require third-party enablement or activation. Supported features may vary by operating system. Please confirm with the system manufacturer for specific features. No technology or product can be completely secure.

AMD