AMDuProfPcm provides basic roofline modeling that relates the application performance to memory traffic and floating point computational peaks. This is a visual performance model offering insights on improving the parallel software for floating point operations. This helps to characterize an application and identify whether a benchmark is memory or compute bound.
The tool monitors the memory traffic and floating point operations when the profiled application is running. Also, it computes the Arithmetic Intensity that is Operations per byte of DRAM traffic [FLOPS/BYTE]. The roofline chart is plotted as:
X-axis: (AI) Arithmetic Intensity (FLOPS/byte) in logarithmic scale
Y-axis: Throughput (GFLOPS/sec) in logarithmic scale
Horizontal line: Peak theoretical floating-point performance of the system (HW Limit).
Diagonal line: Peak memory performance. This line is plotted using the formula Throughput = min (peak theoretical GFLOPS/Second, Peak theoretical Memory Bandwidth * AI).
By default, the tool plots horizontal rooflines for:
Single Precision Floating Point Peak (SP FP Peak)
Double Precision Floating Point Peak (DP FP Peak)
The options available to plot the max peak horizontal (computational) peak rooflines are:
Single precision noSIMD and noFMA
Double precision noSIMD and noFMA
Generating the roofline chart of an application:
Collect and generate roofline HTML plot using AMDuProfPcm.
$ AMDuProfPcm roofline -O /tmp -- /tmp/myapp.exe
An output directory is created in the specified dir (/tmp) in the format AMDuProfPcm-Roofline-<date>-<time> which contains the HTML report (report.html). Open the HTML report and view the roofline graph located in the “” tab.
On AMD Zen 4 and later processors, if the Linux kernel version doesn’t support accessing DF counters, use the following command with root privilege.
$ AMDuProfPcm roofline --msr -O /tmp/ -- /tmp/myapp.exe
To generate the PDF roofline chart, run the following command (to be deprecated).
$ AMDuProfModelling.py -i /tmp/myapp-roofline.csv -o /tmp/ --memspeed 3200 -a myapp
The roofline chart is saved in the file /tmp/AMDuProf_roofline-yyyy-mm-dd-hhmmss.pdf.
A few pointers for generating the roofline chart:
Use option --read-smbios to get memory speed and the number of memory channels from the SMBIOS table. This requires root privileges.
If the AMDuProfPcm is launched with non-root privilege,While collecting the data, specify the DRAM speed using -memspeed option with the AMDuProfModelling.py script. You can use dmidecode or lshw command to get the memory speed.
To plot additional computational horizontal peaks line, use the following options with AMDuProfModelling.py script:
--sp-roofs: Plot maximum peak roof for single-precision noSIMD and noFMA
--dp-roofs: Plot maximum peak roof for double-precision noSIMD and noFMA
Example
$ AMDuProfModelling.py -i /tmp/myapp-roofline.csv -o /tmp/ --memspeed 3200 -a myapp -dp-roofs
Use -a <appname> option with AMDuProfModelling.py script to specify the application name to print in the graph chart.
As this tool uses the maximum theoretical peaks for memory traffic and floating-point performance, you can use benchmarks such as STREAM to get the peak memory bandwidth and HPL or GEMM for peak FLOPS. Those scores can be used to plot the roofline charts. Use the following options with AMDuProfModelling.py script:
--stream <STREAM score>
--hpl <HPL score>
--gemm <SGEMM | DGEMM score>
Sample Roofline Chart
Figure 5.1 Sample Roofline Chart#