AMDuProfSys is a system debugging and analysis tool for AMD processors. It can be used to collect hardware events and evaluate the simple counter values or complex recipes using collected raw events. The performance metrics are based on the profile data collected using Core, L3, DF, and UMC PMCs (Performance Monitoring Counters) or HSMP messaging (via SMN interface).
This tool provides the overall performance details of the hardware blocks used in the system.
AMDuProfSys supports AMD EPYC™ 7002, 7003, and 9000 Series processors with the following variants:
Family 17, model 0x30 – 0x3F
Family 17, model 0x60 - 0x6F
Family 19, model 0x0 - 0xF
Family 19, model 0x1 - 0x1F
Family 19, model 0x20 - 0x2F
Family 19, model 0x60 – 0x6F
Family 19, model 0x70 – 0x7F
Family 19, model 0x90 - 0x9F
Family 19, model 0xA0 – 0xAF
Family 1A, model 0x0 – 0xF
Family 1A, model 0x10 – 0x1F
Family 1A, model 0x20 – 0x2F
Family 1A, model 0x40 – 0x4F
Family 1A, model 0x60 – 0x63
Family 1A, model 0x70
The supported hardware counters are listed here. - CORE PMC - DF PMC - L3 PMC - UMC PMC - HSMP (via SMN interface)
The supported operating systems are:
Linux
Windows
To profile L3 and DF counters while Hyper-V is enabled, switch the system to system mode using the following command. After execution, reboot the system.
To switch to system mode:
bcdedit /set hypervisorperfmon system
To switch back to default:
bcdedit /deletevalue hypervisorperfmon
Execute the OS-specific installation steps provided in this section.
If using tar ball, install uProf driver manually.
To install the AMDuProfSys driver:
# ./AMDPowerProfilerDriver.sh install
Alternatively, Linux Perf can also be used if the user space perf tool is installed in the system. Make sure that Perf tools support the required PMC event monitoring for the platform used. To use Linux Perf for collection instead of the uProf driver, include the --use-linux-perf flag in the command.
To install user space perf tool:
# apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
Disable NMI watchdog; this requires root privileges:
# echo 0 > /proc/sys/kernel/nmi_watchdog
Set Perf parameter to -1 if system-wide profile data or DF and L3 metrics must be collected:
# sh -c 'echo -1 >/proc/sys/kernel/perf_event_paranoid'
To collect using Linux Perf, check if the amd_uncore module is loaded using the following command:
$ lsmod | grep amd_uncore
If not, run the following command:
# modprobe amd_uncore
After successful installation, you can access AMDuProfSys from <Installed Directory>/bin/AMDPerf/AMDuProfSys.
The Setup file installs all the necessary components required to run AMDuProfSys. Once the installation is complete, AMDuProfSys can be accessed at:
<Installed Directory>/bin/AMDPerf/AMDuProfSys.exe
AMDuProfSys [-h] [--help-all] [--version] <COMMAND> <options> <WORKLOAD> <workload-specific-args>
where:
<COMMAND> — To collect, generate report, or get help for this tool
<options> — Command-specific options; detailed in the following tables
<WORKLOAD> — Denotes a launch application to be profiled
<workload-specific-args> — Denotes the list of arguments for the application to be launched using AMDuProfSys
The following table lists the generic options.
Option |
Description |
|---|---|
|
Enable irperf Note It is available only on Linux and requires root privilege. |
|
System information |
|
Display the usage |
|
Print the version |
The Collect command is used to configure profiling and monitor raw events while the launched program is executing, the duration set expires or Ctrl+C key combination is hit.
The options listed in the following table can be preceded by a “collect” keyword for collection of profile data. To collect and generate metrics reports in a single step, the same options can be used with the collect keyword omitted. In this case, report command options are also supported.
The following table lists the collect command options.
Option |
Description |
|---|---|
|
Comma separated list of CPUs. Workload is run on the configured CPUs. |
|
Create a compressed archive of the required session files which can be used in other system for analysis. |
|
Predefined sampling configuration to be used to collect samples. Ignore errors raised to ensure optimal collection conditions and proceed with profiling. Note A prompt to use this option appears when certain errors occur. |
|
Set the multiplexing interval for collection in milliseconds (requires root access with Linux Perf collector) |
|
Collect a large set of AMD-internal events using obfuscated raw events file. A usable report of this data can only be generated internally at AMD, not using distributed Public builds. Note The |
|
Resets the hardware counters before starting the collection. |
|
This option can be used in Linux to collect the profile data using Linux Perf instead of AMDuProf driver. |
|
Wait for |
|
Collect from all the cores. Note Options |
|
List of CPUs to monitor. Multiple CPUs can be provided as a comma separated list with no space: 0,1. Example of CPUs range: 0,5-10 (CPUs 0,5,6,7,8,9,10 are monitored) |
|
Delay profile data collection in milliseconds. Note If duration is set, profiling will start after the delay duration has elapsed. |
|
Profile duration in seconds. Note This option will not work if a workload is passed as argument. |
|
Generate a raw config file using the provided INI file, which can subsequently be utilized for the Obfuscation feature. |
|
Path to INI file / YAML config file. |
|
Interval in milliseconds at which raw event count deltas will be stored in the file. |
|
Collect user defined custom metrics through command line. |
|
Name of the output directory to be created by AMDuProfSys in which raw files and reports will be stored. |
|
Monitor events on existing process(es). Multiple PIDs can be provided as comma separated list. Note It is available only on Linux. |
|
Collect data using predefined config files, ini files, or obfuscated file. |
|
Monitor events on existing thread(s). Multiple TIDs can be provided as a comma separated list. |
|
Enable verbose mode to get detailed output. |
The report command is used to generate a profile report with computed metrics. The collect command run prior to report generation will have generated a profile session file with .ses extension and a raw counter data file for each type of metric-type profiled.
To generate the report, the path to the session file needs to be provided as input (with -i option) as shown in the following command options:
Option |
Description |
|---|---|
|
Create a compressed zipped file of session files with detailed log file. |
|
Path to .ini file used to collect the events. |
|
Set the multiplexing interval for collection in milliseconds (requires root access with Linux Perf collector) |
|
Set floating point precision for reported metrics, the default value is 3. |
|
Output file format in .csv (default) or .xls. |
|
Aggregate result based on the selected grouping-category. |
|
Input the session file generated by collect command. |
|
Output file name in .csv or .xls format as configured. |
|
Generate per-core time series report with instance-wise MIN, MAX, AVG summaries. Only .csv format is supported. |
|
Print verbose. |
Without any config option, a minimal set of metrics (based on core, l3 and df) are collected. This may help get a firsthand system analysis. The following are some of the metrics data that can be collected.
Example command: AMDuProfSys -o /tmp/default -d 30
This will generate the default directory containing default_core.csv, default_l3.csv, and default_df.csv.
AMDuProfSys bundles a bunch of predefined config files for core, l3, df, umc, and hsmp. These configuration files are in json and yaml formats which includes a detailed list of metrics per the metrics type.
Example command: AMDuProfSys --config core,l3,df -a -d 5 -o /tmp/all
This will generate the all directory containing all_core.csv, all_l3.csv, and all_df.csv files.
AMDuProfSys allows obfuscated events to be collected on user systems without needing to expose internal events properties. Users are allowed only to collect the profile data. For generating reports, users must send the collected profile data to AMD.
Collect obfuscated profile data: Users can collect raw config data using raw_config file provided with the AMDuProfSys package.
Example command
AMDuProfSys collect --config core --obfuscation -o temp -d 30
This generates a session folder containing a session file (.ses) and a raw file.
Send data to AMD: Users must send the entire session folder (containing all files) to AMD.
Report generation: Profile reports can only be generated using the internal version of AMDuProfSys by AMD. The session file (.ses) is used as the input to generate the report.
Example command
AMDuProfSys report -i ./temp/temp.ses
Use this option to write your own custom metrics and profile only for those metrics. This option helps exclude all unwanted metrics from the options mentioned earlier to minimize the overhead of multiplexing. However, this option requires knowledge of PMC encoding and knowledge of metrics that you want to profile for. Here is an example of metrics along with an AMDuProfSys command to get the profile data.
TotalRetiredBranceNotTaken= total retired branch- retired branch taken
= PMCx0C2 – umask 0x0 - PMCx0C4 – umask 0x0
= 0x4300C2-0x4300C4
RetiredBranceNotTaken = (TotalRetiredBranceNotTaken * 1000) / Total retired instructions
= (TotalRetiredBranceNotTaken * 1000) / 0x4300C0
Example command
AMDuProfSys --metrics core/TotalRetiredBranceNotTaken="0x4300C2-0x4300C4",core/RetiredBranceNotTaken="(TotalRetiredBranceNotTaken*1000)/0x4300C0"
Output
Figure 13.1 Custom Metrics#
Examples
Note
The following example commands are given with the assumption that the AMDuProfSys binary is located at the CWD (current working directory) -- <Installed Directory>/bin/AMDPerf. Linux-valid commands are given here. To run on Windows, the appropriate executable path should be used (.\AMDuProfSys.exe , if located in CWD). While executing these examples, modify these according to your relative path.
Task Description |
Command |
|---|---|
Collect any user defined custom metrics from command line. |
|
Collect core events with multiplexing interval set to 32 ms. |
|
Collect multiple metric-types using predefined config files and generate report (in 2 steps) |
|
Collect using custom config files and generate report in two steps. |
To generate a report file sci_perf.csv containing computed metrics.
|
Default metrics (core, L3 and DF) system-wide collection for 100 s. |
|
Display all help |
|
Launch the workload /tmp/scimark2 with core affinity set to core 0 and monitor that core and generate profile report. |
|
Monitor the entire system to collect core events defined in config files for 50 seconds and generate the profiled metrics report. |
|
Monitor the entire system to collect raw data over 20 s and generate metric reports in a new folder /tmp/all for core, L3, DF, UMC and HSMP profile types. |
|
Time series profile data for core metrics (core 0-5) using Linux Perf with a logging interval of 1000 ms and set affinity of the workload (/tmp/scimark2) to core 0. |
|
The number of events that can be collected at a time is restricted by the number of hardware PMCs available on a platform. AMDuProfSys divides the list of events into groups, with each group consisting of a maximum of PMC-count number of events. These events groups are multiplexed during a run. Due to this, sampling may not provide 100% accurate data, especially if the profiling has run for very little time.
L3 metrics are not available for profiling on the following platforms:
Family 17, model 0x60 – 0x6F
Family 19, model 0x20 – 0x2F
DF metrics are not available for profiling on the following platforms:
Family 17, model 0x60 – 0x6F
Family 19, model 0x20 – 0x2F
Family 19, model 0x60 – 0x6F
Family 19, model 0x70 – 0x7F
Family 1A, model 0x60 – 0x63
UMC metrics are not available for profiling on the following platforms:
Family 17, model 0x30 – 0x3F
Family 17, model 0x60 – 0x6F
Family 19, model 0x0 – 0xF
Family 19, model 0x20 – 0x2F
Family 19, model 0x60 – 0x6F
Family 19, model 0x70 – 0x7F
Family 19, model 0x90 – 0x9F
Family 1A, model 0x60 – 0x63
Note
Family 1A Hardware Limitation: Metrics reporting remote and local outbound traffic/xGMI traffic could be incorrect.
Hypervisor |
PMUs Supported |
|---|---|
Amazon AWS - Bare metal |
core, l3, df |
ESXi -Ubuntu 20.04 (guest) |
core |
ESXi Red Hat Enterprise Linux release 9.0 (Plow) |
core |
KVM-Linux - Ubuntu 22.04.2 LTS (guest) |
core |
KVM-Linux - Ubuntu 22.04.2 LTS (host) |
core, l3, df, umc |
Windows Hyper-V (guest) |
core |
Windows Hyper-V (Host) |
core, l3, df, umc |