13. Performance Debug using AMDuProfSys

13.1. Overview

AMDuProfSys is a system debugging and analysis tool for AMD processors. It can be used to collect hardware events and evaluate the simple counter values or complex recipes using collected raw events. The performance metrics are based on the profile data collected using Core, L3, DF, and UMC PMCs (Performance Monitoring Counters) or HSMP messaging (via SMN interface).

This tool provides the overall performance details of the hardware blocks used in the system.

13.2. Supported Platforms

AMDuProfSys supports AMD EPYC™ 7002, 7003, and 9000 Series processors with the following variants:

13.3. Supported Hardware Counters

The supported hardware counters are listed here. - CORE PMC - DF PMC - L3 PMC - UMC PMC - HSMP (via SMN interface)

13.4. Supported Operating Systems

The supported operating systems are:

13.4.1. Prerequisites

13.4.1.1. Windows

To profile L3 and DF counters while Hyper-V is enabled, switch the system to system mode using the following command. After execution, reboot the system.

13.4.2. Set up

Execute the OS-specific installation steps provided in this section.

13.4.2.1. Linux

If using tar ball, install uProf driver manually.

To install the AMDuProfSys driver:

#  ./AMDPowerProfilerDriver.sh install

Alternatively, Linux Perf can also be used if the user space perf tool is installed in the system. Make sure that Perf tools support the required PMC event monitoring for the platform used. To use Linux Perf for collection instead of the uProf driver, include the --use-linux-perf flag in the command.

To install user space perf tool:

# apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Disable NMI watchdog; this requires root privileges:

# echo 0 > /proc/sys/kernel/nmi_watchdog

Set Perf parameter to -1 if system-wide profile data or DF and L3 metrics must be collected:

# sh -c 'echo -1 >/proc/sys/kernel/perf_event_paranoid'

To collect using Linux Perf, check if the amd_uncore module is loaded using the following command:

$ lsmod | grep amd_uncore

If not, run the following command:

# modprobe amd_uncore

After successful installation, you can access AMDuProfSys from <Installed Directory>/bin/AMDPerf/AMDuProfSys.

13.4.2.2. Windows

The Setup file installs all the necessary components required to run AMDuProfSys. Once the installation is complete, AMDuProfSys can be accessed at:

<Installed Directory>/bin/AMDPerf/AMDuProfSys.exe

13.5. Options

13.5.1. Synopsis

AMDuProfSys [-h] [--help-all] [--version] <COMMAND> <options> <WORKLOAD> <workload-specific-args>

where:

13.5.2. Generic

The following table lists the generic options.

Table 13.1 AMDuProfSys Generic Options#

Option

Description

--enable-irperf

Enable irperf

Note

It is available only on Linux and requires root privilege.

--system-info

System information

-h, --help, --help-all

Display the usage

-v, --version

Print the version

13.5.3. Collect Command

The Collect command is used to configure profiling and monitor raw events while the launched program is executing, the duration set expires or Ctrl+C key combination is hit.

The options listed in the following table can be preceded by a “collect” keyword for collection of profile data. To collect and generate metrics reports in a single step, the same options can be used with the collect keyword omitted. In this case, report command options are also supported.

The following table lists the collect command options.

Table 13.2 AMDuProfSys Collect Command Options#

Option

Description

--affinity <CPUs>

Comma separated list of CPUs. Workload is run on the configured CPUs.

--export-session

Create a compressed archive of the required session files which can be used in other system for analysis.

--force

Predefined sampling configuration to be used to collect samples.

Ignore errors raised to ensure optimal collection conditions and proceed with profiling.

Note

A prompt to use this option appears when certain errors occur.

--mux-interval <MUX_INTERVAL (milliseconds)>

Set the multiplexing interval for collection in milliseconds (requires root access with Linux Perf collector)

--obfuscation

Collect a large set of AMD-internal events using obfuscated raw events file. A usable report of this data can only be generated internally at AMD, not using distributed Public builds.

Note

The -r option is scheduled for deprecation and will be removed in an upcoming release.

--reset-hw-counters

Resets the hardware counters before starting the collection.

--use-linux-perf

This option can be used in Linux to collect the profile data using Linux Perf instead of AMDuProf driver.

--wait-for-signal

Wait for SIGUSR1(Linux)/SIGBREAK(Windows) to start the collection, SIGINT to stop.

-a, --all-cpus

Collect from all the cores.

Note

Options -C and –a cannot be used together.

-C, --cpu <CPUs>

List of CPUs to monitor. Multiple CPUs can be provided as a comma separated list with no space: 0,1.

Example of CPUs range: 0,5-10 (CPUs 0,5,6,7,8,9,10 are monitored)

-D, --delay <DELAY (milliseconds)>

Delay profile data collection in milliseconds.

Note

If duration is set, profiling will start after the delay duration has elapsed.

-d, --duration <DURATION (seconds)>

Profile duration in seconds.

Note

This option will not work if a workload is passed as argument.

-g, --gen-raw-events

Generate a raw config file using the provided INI file, which can subsequently be utilized for the Obfuscation feature.

-i, --input-file <INPUT_FILE>

Path to INI file / YAML config file.

-I, --interval <INTERVAL (milliseconds)>

Interval in milliseconds at which raw event count deltas will be stored in the file.

-m, --metrics

Collect user defined custom metrics through command line.

-o, --output-file <OUTPUT_FILE>

Name of the output directory to be created by AMDuProfSys in which raw files and reports will be stored.

-p, --pid <pid>

Monitor events on existing process(es). Multiple PIDs can be provided as comma separated list.

Note

It is available only on Linux.

-s, --config

Collect data using predefined config files, ini files, or obfuscated file.

-t, --tid <tid>

Monitor events on existing thread(s). Multiple TIDs can be provided as a comma separated list.

-V, --verbose

Enable verbose mode to get detailed output.

13.5.4. Report Command

The report command is used to generate a profile report with computed metrics. The collect command run prior to report generation will have generated a profile session file with .ses extension and a raw counter data file for each type of metric-type profiled.

To generate the report, the path to the session file needs to be provided as input (with -i option) as shown in the following command options:

Table 13.3 AMDuProfSys Collect Report Options#

Option

Description

--export-session

Create a compressed zipped file of session files with detailed log file.

--ini <INI>

Path to .ini file used to collect the events.

--mux-interval <MUX_INTERVAL (milliseconds)>

Set the multiplexing interval for collection in milliseconds (requires root access with Linux Perf collector)

--set-precision <n>

Set floating point precision for reported metrics, the default value is 3.

-f, --format

Output file format in .csv (default) or .xls.

-G, --group-by <system | package | numa | ccx>

Aggregate result based on the selected grouping-category.

-i, --input-file <file>

Input the session file generated by collect command.

-o, --output-file <file>

Output file name in .csv or .xls format as configured.

-T, --time-series

Generate per-core time series report with instance-wise MIN, MAX, AVG summaries. Only .csv format is supported.

-V, --verbose

Print verbose.

13.6. Configuration Option

13.6.1. Default configuration

Without any config option, a minimal set of metrics (based on core, l3 and df) are collected. This may help get a firsthand system analysis. The following are some of the metrics data that can be collected.

Example command: AMDuProfSys -o /tmp/default -d 30

This will generate the default directory containing default_core.csv, default_l3.csv, and default_df.csv.

13.6.2. Predefined Configurations

AMDuProfSys bundles a bunch of predefined config files for core, l3, df, umc, and hsmp. These configuration files are in json and yaml formats which includes a detailed list of metrics per the metrics type.

Example command: AMDuProfSys --config core,l3,df -a -d 5 -o /tmp/all

This will generate the all directory containing all_core.csv, all_l3.csv, and all_df.csv files.

13.6.3. Obfuscation Support

AMDuProfSys allows obfuscated events to be collected on user systems without needing to expose internal events properties. Users are allowed only to collect the profile data. For generating reports, users must send the collected profile data to AMD.

13.6.3.1. Feature Workflow

  1. Collect obfuscated profile data: Users can collect raw config data using raw_config file provided with the AMDuProfSys package.

    Example command

    AMDuProfSys collect --config core --obfuscation -o temp -d 30
    

    This generates a session folder containing a session file (.ses) and a raw file.

  2. Send data to AMD: Users must send the entire session folder (containing all files) to AMD.

  3. Report generation: Profile reports can only be generated using the internal version of AMDuProfSys by AMD. The session file (.ses) is used as the input to generate the report.

    Example command

    AMDuProfSys report -i ./temp/temp.ses
    

13.6.4. Custom Metrics

Use this option to write your own custom metrics and profile only for those metrics. This option helps exclude all unwanted metrics from the options mentioned earlier to minimize the overhead of multiplexing. However, this option requires knowledge of PMC encoding and knowledge of metrics that you want to profile for. Here is an example of metrics along with an AMDuProfSys command to get the profile data.

TotalRetiredBranceNotTaken= total retired branch- retired branch taken
= PMCx0C2 – umask 0x0 - PMCx0C4 – umask 0x0
= 0x4300C2-0x4300C4
RetiredBranceNotTaken = (TotalRetiredBranceNotTaken * 1000) / Total retired instructions
= (TotalRetiredBranceNotTaken * 1000) / 0x4300C0

Example command

AMDuProfSys --metrics   core/TotalRetiredBranceNotTaken="0x4300C2-0x4300C4",core/RetiredBranceNotTaken="(TotalRetiredBranceNotTaken*1000)/0x4300C0"

Output

Custom Metrics.

Figure 13.1 Custom Metrics#

Examples

Note

The following example commands are given with the assumption that the AMDuProfSys binary is located at the CWD (current working directory) -- <Installed Directory>/bin/AMDPerf. Linux-valid commands are given here. To run on Windows, the appropriate executable path should be used (.\AMDuProfSys.exe , if located in CWD). While executing these examples, modify these according to your relative path.

Table 13.4 Custom Metrics Examples#

Task Description

Command

Collect any user defined custom metrics from command line.

./AMDuProfSys --metrics core/ BrMisPredExTime="(0x4300C3)/(0x4300C2)",core/ ratio="((BrMisPredExTime * 0x430076)/0x4300C0)"-d 20

Collect core events with multiplexing interval set to 32 ms.

./AMDuProfSys --config core -a --mux-interval 32 -d 10

Collect multiple metric-types using predefined config files and generate report (in 2 steps)

  1. Collect core, l3, df data for CPUs 0-10

``./AMDuProfSys collect --config core,l3,df -C 0-10 -o sci_perf taskset -c 0 scimark2``

Note

C / -a option is applicable only for core counters.

  1. To generate a report file sci_perf.csv containing computed metrics:

    ``./AMDuProfSys report -i sci_perf/sci_perf.ses -o all_events``
    

Collect using custom config files and generate report in two steps.

./AMDuProfSys collect --config data/0x17_0x3/ configs/core/core_config.yaml -C 0 -o sci_perf taskset -c 0 scimark2

To generate a report file sci_perf.csv containing computed metrics.

./AMDuProfSys report -i sci_perf/sci_perf.ses -o all_events

Default metrics (core, L3 and DF) system-wide collection for 100 s.

AMDuProfSys -o default -a -d 100

Display all help

./AMDuProfSys --help-all

Launch the workload /tmp/scimark2 with core affinity set to core 0 and monitor that core and generate profile report.

./AMDuProfSys --config core -C 0 —affinity 0 / tmp/scimark2

Monitor the entire system to collect core events defined in config files for 50 seconds and generate the profiled metrics report.

./AMDuProfSys --config core -a -d 50

Monitor the entire system to collect raw data over 20 s and generate metric reports in a new folder /tmp/all for core, L3, DF, UMC and HSMP profile types.

./AMDuProfSys --config core,l3,df,umc,hsmp -o / tmp/all -a -d 20

Time series profile data for core metrics (core 0-5) using Linux Perf with a logging interval of 1000 ms and set affinity of the workload (/tmp/scimark2) to core 0.

./AMDuProfSys --config core -C 0-5 -I 1000 --use-linux-perf -T -o output --affinity 0 /tmp/scimark2

13.7. Limitations

The number of events that can be collected at a time is restricted by the number of hardware PMCs available on a platform. AMDuProfSys divides the list of events into groups, with each group consisting of a maximum of PMC-count number of events. These events groups are multiplexed during a run. Due to this, sampling may not provide 100% accurate data, especially if the profiling has run for very little time.

Note

Family 1A Hardware Limitation: Metrics reporting remote and local outbound traffic/xGMI traffic could be incorrect.

13.8. Virtualization Support

Table 13.5 Virtualization Support Examples#

Hypervisor

PMUs Supported

Amazon AWS - Bare metal

core, l3, df

ESXi -Ubuntu 20.04 (guest)

core

ESXi Red Hat Enterprise Linux release 9.0 (Plow)

core

KVM-Linux - Ubuntu 22.04.2 LTS (guest)

core

KVM-Linux - Ubuntu 22.04.2 LTS (host)

core, l3, df, umc

Windows Hyper-V (guest)

core

Windows Hyper-V (Host)

core, l3, df, umc