1. About AMD uProf

AMD uProf is a performance analysis tool for applications running on Windows and Linux operating systems. It allows developers to understand and improve the runtime performance of their application.

AMD uProf offers the following functionalities:

1.1. User Interfaces

AMD uProf has the following user interfaces:

Table 1.1 User Interface#

Executable

Description

Supported Operating System

AMDuProf

GUI to perform CPU and Power Profile

Windows and Linux

AMDPerf/AMDuProfSys

CLI tool to perform System Analysis

Windows and Linux

AMDuProfPcm

CLI to perform System Analysis

Windows, Linux, and FreeBSD

AMDuProfCLI

CLI to perform CPU and Power Profile

Windows, Linux, and FreeBSD

AMD uProf can effectively:

1.2. Hardware Sources

1.2.1. Performance Monitor Counters (PMC)

AMD processors have Performance Monitor Counters (PMC) that helps monitor various micro-architectural events in a CPU core. The PMC counters are used in two modes:

The number of hardware performance event counters available in each processor is implementation- dependent. For the exact number of hardware performance counters, refer the Processor Programming Reference-PPR of the specific processor. The operating system and/or BIOS can reserve one or more counters for internal use. Thus, the actual number of available hardware counters may be less than the number of hardware counters. The CPU Profiler uses all available counters for profiling.

1.2.2. Instruction-Based Sampling (IBS)

IBS is a code profiling mechanism that enables the processor to select a random instruction fetch or micro-Op after a programmed time interval has expired and record specific performance information about the operation. An interrupt is generated when the operation is complete as specified by IBS Control MSR. An interrupt handler can then read the performance information that was logged for the operation.

The IBS mechanism is split into two parts:

The instruction fetch sampling provides information about instruction TLB and instruction cache behavior for fetched instructions.

Instruction execution sampling provides information about micro-Op execution behavior.

The data collected for the instruction fetch performance is independent of the data collected for the instruction execution performance.

Instruction execution performance is profiled by tagging one micro-Op associated with an instruction. Instructions that decode to more than one micro-Op return different performance data depending upon which micro-Op associated with the instruction is tagged. These micro-Ops are associated with the RIP of the next instruction.

In this mode, the CPU Profiler uses the IBS HW supported by the AMD processor to observe the effect of instructions on the processor and on the memory subsystem. In IBS, the hardware events are linked with the instruction that caused them. Also, the hardware events are used by the CPU Profiler to derive various metrics, such as data cache latency.

1.2.3. L3 Cache Performance Monitor Counters (L3PMC)

A Core Complex (CCX) is a group of CPU cores that share L3 cache resources. All the cores in a CCX share a single L3 cache. L3PMCs are available for AMD “Zen”-based processors to monitor the performance of L3 resources. For more information, refer the respective PPR for the processor.

1.2.4. Data Fabric Performance Monitor Counters (DFPMC)

For AMD “Zen”-based processors, DFPMCs are available to monitor the performance of Data Fabric resources. For more information, refer the respective Processor Programming Reference (PPR) for the processor.

1.3. Support

For support options, latest documentation, and downloads, refer to AMD uProf page.