AMD uProf is a performance analysis tool for applications running on Windows and Linux operating systems. It allows developers to understand and improve the runtime performance of their application.
AMD uProf offers the following functionalities:
Performance Analysis (CPU Profile): To identify runtime performance bottlenecks of the application.
System Analysis: To monitor system performance metrics, such as IPC and memory bandwidth.
Live Power Profile: To monitor thermal and power characteristics of the system.
AMD uProf has the following user interfaces:
Executable |
Description |
Supported Operating System |
|---|---|---|
AMDuProf |
GUI to perform CPU and Power Profile |
Windows and Linux |
AMDPerf/AMDuProfSys |
CLI tool to perform System Analysis |
Windows and Linux |
AMDuProfPcm |
CLI to perform System Analysis |
Windows, Linux, and FreeBSD |
AMDuProfCLI |
CLI to perform CPU and Power Profile |
Windows, Linux, and FreeBSD |
AMD uProf can effectively:
Analyze the performance of one or more processes/applications.
Track down the performance bottlenecks in the source code.
Identify ways to optimize the source code for better performance and power efficiency.
Examine the behavior of kernels, drivers, and system modules.
Observe system level thermal and power characteristics.
Observe system metrics, such as IPC and memory bandwidth.
AMD processors have Performance Monitor Counters (PMC) that helps monitor various micro-architectural events in a CPU core. The PMC counters are used in two modes:
Counting mode: These counters are used to count the specific events that occur in a CPU core.
Sampling mode: These counters are programmed to count the specific number of events. Once the count reaches the appropriate number of times (called sampling interval), an interrupt is triggered. During the interrupt handling, the CPU Profiler collects the profile data.
The number of hardware performance event counters available in each processor is implementation- dependent. For the exact number of hardware performance counters, refer the Processor Programming Reference-PPR of the specific processor. The operating system and/or BIOS can reserve one or more counters for internal use. Thus, the actual number of available hardware counters may be less than the number of hardware counters. The CPU Profiler uses all available counters for profiling.
IBS is a code profiling mechanism that enables the processor to select a random instruction fetch or micro-Op after a programmed time interval has expired and record specific performance information about the operation. An interrupt is generated when the operation is complete as specified by IBS Control MSR. An interrupt handler can then read the performance information that was logged for the operation.
The IBS mechanism is split into two parts:
Instruction Fetch Performance
Instruction Execution Performance
The instruction fetch sampling provides information about instruction TLB and instruction cache behavior for fetched instructions.
Instruction execution sampling provides information about micro-Op execution behavior.
The data collected for the instruction fetch performance is independent of the data collected for the instruction execution performance.
Instruction execution performance is profiled by tagging one micro-Op associated with an instruction. Instructions that decode to more than one micro-Op return different performance data depending upon which micro-Op associated with the instruction is tagged. These micro-Ops are associated with the RIP of the next instruction.
In this mode, the CPU Profiler uses the IBS HW supported by the AMD processor to observe the effect of instructions on the processor and on the memory subsystem. In IBS, the hardware events are linked with the instruction that caused them. Also, the hardware events are used by the CPU Profiler to derive various metrics, such as data cache latency.
A Core Complex (CCX) is a group of CPU cores that share L3 cache resources. All the cores in a CCX share a single L3 cache. L3PMCs are available for AMD “Zen”-based processors to monitor the performance of L3 resources. For more information, refer the respective PPR for the processor.
For AMD “Zen”-based processors, DFPMCs are available to monitor the performance of Data Fabric resources. For more information, refer the respective Processor Programming Reference (PPR) for the processor.
For support options, latest documentation, and downloads, refer to AMD uProf page.