8. Code Profiling

8.1. Profiling a .NET/CLR Application

AMD uProf supports .NET/CLR application profiling using common language runtime (CLR) profiler using profiling API.

AMD uProf provides CLR Agent libraries: AMDClrProfAgent.dll on Windows. This CLR Agent library must be loaded during startup of the target managed process.

8.1.1. Prerequisites

Use the dotnet --version command to ensure that the .NET is installed on the system.

8.1.2. Launching a .NET Application

If the .NET application is launched by AMD uProf, the tool would set COR_PROFILER environment variable which specifies the CLSID of the AMDClrProfAgent library which uses ICLRProfiling Interface to attach the agent to running CLR process AMD uProf would be able to collect the profile data and attribute the samples to CLR process.

8.1.2.1. Using GUI

  1. To launch the AMDuProf GUI, go to Home > Welcome page.

  2. Click Profile an Application on the Welcome page.

  3. Provide application path, application options, working directory, and environment variables, if any. Click Next.

    Specify the following configuration parameters:

    1. Application Path: Path to .NET binary

    2. Application Options: Launch app arguments

    3. Working Directory: Launch app path

  4. From Predefined Configs, select Required Configuration.

  5. Click Start Profile to start the profiling.

8.1.2.2. Using CLI

8.1.3. Analyzing Profiled .NET Data

Click the ANALYZE tab to identify the hottest .NET functions.

Function Hotspots.

Figure 8.1 Function Hotspots#

8.1.3.1. .NET Source View

AMD uProf will attribute the profile samples to .NET methods and the source tab will show the .NET source lines with the corresponding samples attributed to them.

Refer the section Source and Assembly for more information on this screen. The following figure shows the source view of the .NET method:

Source view - .NET Method.

Figure 8.2 Source view - .NET Method#

8.1.4. Limitations

8.2. Profiling a Java Application

AMD uProf supports Java application profiling running on JVM. To support this, it uses JVM Tool Interface (JVMTI).

AMD uProf provides JVMTI Agent libraries: AMDJvmtiAgent.dll on Windows and libAMDJvmtiAgent.so on Linux. This JvmtiAgent library must be loaded during start up of the target JVM process.

8.2.1. Prerequisites

Use the which java command and ensure that Java is installed on the system.

Run echo $JAVA_HOME to see if it is pointing to the supported Java version.

8.2.2. Configuration

8.2.2.1. Launching a Java Application

When the Java application is launched by AMD uProf, the tool can collect the profile data and attribute the samples to interpreted Java functions.

Using GUI

  1. To launch the AMDuProf GUI, go to Home > Welcome page.

  2. Click Profile an Application on the Welcome page.

  3. Provide application path, application options, working directory, and environment variables, if any. Click Next.

    Specify the following configuration parameters:

    1. Application Path: Path to Java binary

    2. Application Options: Launch app arguments

    3. Working Directory: Launch app path

  4. From Predefined Configs, select Required Configuration.

  5. Set the timer interval and profiling signal.

  6. Only if using Linux: From Advanced Options, select the Callstack Collection and Callstack Unwind Depth.

  7. Click Start Profile to start the profiling.

Using CLI

Note

Use absolute paths as arguments when running a Java executable.

8.2.3. Attaching a Java Process to Profile

8.2.3.1. On Linux Platforms

We can attach the running Java application using --pid option. AMD uProf would be able to collect the profile data and attribute the samples to interpreted Java functions.

8.2.3.1.1. Using GUI

To launch the AMDuProf GUI, go to Home > Welcome page.

  1. Click Profile an Application on the Welcome page.

  2. Click Profile running Process(es) on the Welcome page..

  3. From Predefined Configs, select GPU Profile.

    Profile Target Java Process.

    Figure 8.3 Profile Target Java Process#

  4. Select the process Id.

  5. From Advanced Options, select Callstack Collection and Callstack Unwind Depth.

  6. Click Start Profile to start the profiling.

8.2.3.1.2. Using CLI

Note

Default duration is 30 seconds.

8.2.3.1.3. On Windows and FreeBSD Platforms

AMD uProf cannot attach JvmtiAgent dynamically to an already running JVM. Hence, for any JVM process profiled by attach-process mechanism, AMD uProf cannot capture any class information, unless the JvmtiAgent library is loaded during JVM process start up.

To profile an already running Java process, pass -agentpath <path-to-agent-lib>option while launching Java application so that AMD uProf can attach to the Java PID to collect profile data later.

For a 64-bit JVM on Windows and FreeBSD

C:\> java -agentpath:<C:\ProgramFiles\AMD\AMDuProf\bin\ProfileAgents\x64\AMDJvmtiAgent.dll> <java-app-launch-options>

Note the process id (PID) of this JVM instance.

8.2.3.1.4. Using GUI

To launch the AMDuProf GUI, go to Home > Welcome page.

  1. Click Profile an Application on the Welcome page.

  2. Click Profile running Process(es) on the Welcome page..

  3. From Select Profile Target, select Process(es).

  4. Select the Java process Id.

  5. From Advanced Options, select Callstack Collection and Callstack Unwind Depth.

  6. Click Start Profile to start the profiling.

8.2.3.2. Java Source View

AMD uProf will attribute the profile samples to Java methods and the source tab will show and the Java source lines with the corresponding samples attributed to them.

Refer the section Source and Assembly for more information on the source screen.

The following figure shows the source view of the Java method:

Java Method - Source View.

Figure 8.4 Java Method - Source View#

8.2.3.3. Java Call Stack and Flame Graph

Note

For Java attach to process on Linux, pass the JVM option (-XX:+PreserveFramePointer) while launching the target application to collect correct java app callstack using AMD uProf.

To collect call stack for profiling Java application:

$ ./AMDuProfCLI collect --config tbp -g -w <java-app-dir> <path-to-java-exe> <java-app-main>
Java Application - Flame Graph.

Figure 8.5 Java Application - Flame - Graph#

8.2.4. Limitations

Java profiling has the following limitations:

8.3. Profiling a Python Application

AMD uProf provides comprehensive profiling capabilities for Python applications, enabling developers to identify and analyze performance bottlenecks and extended execution times on Linux operating systems. The profiler utilizes the Python interpreter’s runtime support mechanisms to perform detailed Hotspot Analysis, facilitating performance characterization and application optimization.

Key Features

Profiling Modes

AMD uProf provides two profiling modes for Python applications, each with distinct capabilities and version support:

  1. eBPF Sampling Mode: Available exclusively for Python 3.10

  2. Tracing Mode: Available for Python 3.10, 3.11, 3.12, and 3.13

The following table details the feature support matrix for each profiling mode:

Table 8.1 Python Profiling Modes - Feature Support Matrix#

Feature

eBPF Sampling Mode

Tracing Mode

Function-Level Attribution

Yes

Yes

Source-Level Attribution

Yes

No

Mixed-Mode Profiling (Native + Python)

Yes

Yes

Call Stack Analysis

Yes (requires frame pointer)

Yes

Multi-Process Support

Yes

Yes

Multi-Threaded Support

Yes

Yes

Launch Application Profiling

Yes

Yes

Attach to Running Process

Yes

No

8.3.1. eBPF Sampling Mode

eBPF Sampling mode utilizes the Linux kernel’s extended Berkeley Packet Filter (eBPF) functionality to collect performance samples with minimal overhead. This mode provides comprehensive profiling capabilities including source-level attribution and mixed-mode analysis.

8.3.1.1. Key Characteristics

8.3.1.2. Prerequisites

sudo ./AMDuProfSetup.sh

8.3.1.3. Configuration

The eBPF Sampling mode integrates with Hotspot Analysis to profile Python applications through both launch and attach profiling workflows. When a Python interpreter path is specified as the target executable, the profiler enables Python-specific sample collection in parallel with standard Hotspot Analysis. This collection encompasses samples from all child processes and threads spawned by the target Python script.

The Python Profiling Agent supports profiling applications launched directly via Python interpreter as well as those initiated through shell scripts.

8.3.1.3.1. Launching a Python Application
8.3.1.3.1.1. Using GUI
  1. Launch AMDuProf and navigate to Home > Welcome page.

  2. Select Profile an Application.

  3. Configure the target application parameters including the executable path, command-line arguments, working directory, and environment variables as needed. Click Next to proceed.

    Specify the following configuration parameters:

    1. Application Path: Path to python binary

    2. Application Options: Application arguments

    3. Working Directory: Application path

Configure Python Application.

Figure 8.6 Python Application Configuration#

  1. From the Predefined Configs section, select Hotspots Configuration.

Select Hotspots Configuration.

Figure 8.7 Hotspots Profile Selection#

  1. Expand the Advanced Options section and enable eBPF mode Python Profiling.

Enable eBPF Python Profiling.

Figure 8.8 eBPF-based Python Profiling Activation#

  1. Click Start Profile to begin the profiling session.

8.3.1.3.1.2. Using CLI

The --python option enables eBPF-based sampling for Python profiling.

AMDuProfCLI collect --config hotspots -o <output-dir> --python python <application.py>
AMDuProfCLI collect --config hotspots -g -o <output-dir> --python python <application.py>

The -g option enables call stack sample collection for both Python and native code execution.

Note

The Python binary must be compiled with frame pointer support enabled to ensure accurate call stack collection.

AMDuProfCLI collect --config hotspots -o <output-dir> --python <application.sh>
8.3.1.3.2. Attaching to a Running Python Process

You can attach AMD uProf to the running Python processes using the --pid option, enabling profile data collection and sample attribution to Python functions. The --python flag activates eBPF-based sampling for Python profiling.

8.3.1.3.2.1. Using GUI
  1. Launch the AMDuProf GUI and navigate to Home > Welcome page.

  2. Click Profile an Application.

  3. Click Profile running Process(es).

  4. From Predefined Configs, select Hotspots Configuration.

  5. Select the process ID.

  6. Click Advanced Options and select eBPF mode Python Profiling.

  7. Click Start Profile.

8.3.1.3.2.2. Using CLI
AMDuProfCLI collect --config hotspots -g -o <output-dir> --python -p <python-process-id>

Specify the process ID of the target Python process using the -p option.

8.3.1.4. Analyze the Data

Alongside native data timing metrics the Python profiler will also enable hotspots for the Python application. Identifying the hottest functions in Hotspot analysis is mentioned in Hotspots Analysis.

Python Hotspots.

Figure 8.9 Python Hotspots#

8.3.1.5. Identify the hot code paths

Use Flame Graph to identify hottest code paths of an application, the code path contains both native and Python functions.

Flame Graph.

Figure 8.10 Function Hotspots#

The code path identification of Hotspot Analysis can be found at Hotspots Analysis.

8.3.1.6. Python Source View

AMD uProf attributes performance samples to Python functions and displays corresponding source line annotations in the Source tab view.

For detailed information on source-level analysis capabilities, refer to Source and Assembly.

Source view - Python Method.

Figure 8.11 Source view - Python Method#

Note

Assembly-level profiling is not supported for Python applications.

8.3.1.7. Limitations

In addition to the Hotspots Analysis Limitations, Python profiling has the following constraints:

8.3.2. Tracing Mode

Tracing mode employs Python’s native tracing hooks to monitor function execution and collect performance data. This mode offers broad compatibility across multiple Python versions while maintaining function-level profiling capabilities.

8.3.2.1. Key Characteristics

8.3.2.2. Prerequisites

Use the which python command to ensure that the Python alias is pointing to the supported Python interpreter versions.

8.3.2.3. Configuration

Use Hotspot Analysis to profile python application using launch application profile scope. When the target executable is a path to python interpreter, the profiler will automatically initiate sample collections for the Python application alongside the Hotspot Analysis. The samples gathered by the Python Profiling Agent will include those from child processes and threads created by the targeted Python script.

Additionally, the Python Profiling Agent also supports profiling Python applications launched from shell scripts.

8.3.2.3.1. Launching a Python Application
8.3.2.3.1.1. Using GUI

To launch the AMDuProf GUI, go to Home > Welcome page.

  1. Click Profile an Application on the Welcome page.

  2. Click Profile running Process(es) on the Welcome page.

  3. From Select Profile Target, select Hotspots.

  4. Set the timer interval and profiling signal.

  5. From Advanced Options, select Callstack Collection and Callstack Unwind Depth.

  6. Click Start Profile to start the profiling.

8.3.2.3.1.2. Using CLI
AMDuProfCLI collect --config hotspots -o <output-dir> python <application.py>
AMDuProfCLI collect --config hotspots -g -o <output-dir> python <application.py>
AMDuProfCLI collect --config hotspots -o <output-dir> <application.sh>

To show python interpreter functions in the callgraph/flamegraph

AMDuProfCLI report -i <python_session> --detail --python-show-all

Note

Default sampling interval is 10 seconds.

For more information refer to Hotspots Analysis.

8.3.2.4. Analyze the Data

Use Hotspot Analysis to identify the hottest Python functions and a mixed-mode call stack (both native and Python).

8.3.2.5. Identify the hottest function

Alongside native data timing metrics the Python profiler will enable hotspots for the Python application. Identifying the hottest functions in Hotspot analysis is mentioned in Hotspots Analysis.

Function Hotspots.

Figure 8.12 Function Hotspots#

8.3.2.6. Identify the hot code paths

Use Flame Graph to identify hottest code paths of an application, the code path contains both native and Python functions.

Flame Graph.

Figure 8.13 Function Hotspots#

The code path identification of Hotspot Analysis can be found at Hotspots Analysis.

8.3.2.7. Limitations

Along with Hotspots Analysis Limitations, Python profiling has a few other limitations:

8.4. MPI Code Profiling

The MPI programs launched through mpirun or mpiexec launcher programs can be profiled by AMD uProf. To profile the MPI applications and analyze the data, complete the following the steps:

  1. Collect the profile data using CLI collect command.

  2. Process the profile data using CLI translate command which will generate the profile database.

  3. Import the profile database in the GUI or generate the CSV report using CLI report command.

  4. Multiple ranks profiling requires higher limit to be set for memory locking using one of the following methods: - Increase the memory lock limit using the command ulimit -l, depending on the number of ranks to be profiled on the target node. - Set proc/sys/kernel/perf_event_paranoid to -1 or higher value based on the profile config and scope. - Profile MPI applications with root privilege.

  5. Multiple ranks profiling might require a high number of file descriptors. If the file descriptor limit is reached during profile data collection, an error message will be displayed. You can increase this limit in the file /etc/security/limits.conf.

  6. For Multiple ranks profiling, if the /proc/sys/kernel/perf_event_paranoid value is greater than -1, you must increase the /proc/sys/kernel/perf_event_mlockb value depending on the number of ranks to profile. Alternatively, you can also use the -m option to decrease the number of memory data buffer pages used by each instance of AMDuProfCLI.

Support Matrix

The profiling of MPI applications supports components and their corresponding versions provided in this MPI Trace Support Matrix.

8.4.1. Collecting Data Using CLI

The MPI jobs are launched using MPI launchers such as mpirun and mpiexec. Use AMDuProfCLI to collect the CPU profile data for an MPI application.

The MPI job launch through mpirun uses the following syntax:

$ mpirun [options] <program> [<args>]

AMDuProfCLI is launched using <program> and the application is launched using the AMDuProfCLI’s arguments. So, use the following syntax to profile an MPI application using AMDuProfCLI:

$ mpirun [options] AMDuProfCLI [options] <program> [<args>]

The specific AMDuProfCLI flags for profiling MPI applications:

A typical command uses the following syntax:

$ mpirun -np <np> /tmp/AMDuProf/bin/AMDuProfCLI collect –config <config-type> --trace mpi --output-dir <output_dir> [mpi_app] [<mpi_app_options>]

If an MPI application is launched on multiple nodes, AMDuProfCLI will profile all the MPI rank processes running on all the nodes. You can analyze the data for processes run on one/many/all node(s).

Method 1 - Profile All the Ranks On Single/Multiple Node(s)

To collect profile data for all the ranks running on a single node, execute the following commands:

$ mpirun -np 16 /tmp/AMDuProf/bin/AMDuProfCLI collect --config tbp --trace mpi --output-dir /tmp/myapp-perf myapp.exe

To collect profile data for all the ranks in multiple nodes, use the options -H / --host mpirun or specify -hostfile <hostfile>:

$ mpirun -np 16 -H host1,host2 /tmp/AMDuProf/bin/AMDuProfCLI collect --config tbp --trace mpi --output-dir /tmp/myapp-perf myapp.exe
$ mpirun -np 16 -H host1,host2 /tmp/AMDuProf/bin/AMDuProfCLI collect
--config tbp --mpi --output-dir /tmp/myapp-perf myapp.exe

Method 2 - Profiling Specific Rank(s)

To profile only a single rank running on host2, execute the following commands:

$ export AMDUPROFCLI_CMD=/tmp/AMDuProf/bin/AMDuProfCLI collect --config tbp --trace mpi --output-dir /tmp/myapp-perf
$ mpirun -np 4 -host host1 myapp.exe : -host host2 -np 1 $AMDUPROFCLI_CMD myapp.exe

To profile only a single rank in setup where 256 ranks running on 2 hosts (128 ranks per host):

$ mpirun -host host1:128 -np 1 $AMDUPROFCLI_CMD myapp.exe : -host host2:128,host1:128 -np 255
--map-by core myapp.exe

The mpirun also takes config file as an input and the AMDuProfCLI can be used with the config file to profile the MPI application.

Config file (myapp_config):

#MPI - myapp config file
-host host1 -n 4 myapp.exe
-host host2 -n 2 /tmp/AMDuProf/bin/AMDuProfCLI collect --config tbp --trace mpi \
--output-dir /tmp/myapp-perf myapp.exe

To run this config to collect data only for the MPI processes running on host2, execute the following command:

$ mpirun --app myapp_config

8.4.2. Analyzing the Data with CLI

The data collected for MPI processes can be analyzed using the CSV reported by the AMDuProfCLI report command. The generated reported is saved to the file report.csv in the <output-dir>/<SESSION-DIR> folder.

Following are the reporting options for the CLI:

$ AMDuProfCLI report --input-dir /tmp/myapp-perf/<SESSION-DIR> --host host1
$ AMDuProfCLI report --input-dir /tmp/myapp-perf/<SESSION-DIR> --host host2

Note

Option --host is not mandatory to create the report file for the localhost.

8.4.3. Analyze the Data with GUI

To analyze the profile data in the GUI, complete the following steps:

  1. To generate the profile database, refer Analyzing the Data with CLI.

  2. To import the profile database, refer Importing Profile Database.

8.4.4. Limitations

8.5. Profiling Linux System Modules

To attribute the samples to the system modules (for example, glibc and libm), AMD uProf uses the corresponding debug info files. The Linux distros do not contain the debug info files, but most of the popular distros provide options to download the debug info files.

Refer the following resources for more information on how to download the debug info files:

Ensure that you download the debug info files for the required system modules for the required Linux distros before starting the profiling.

8.6. Profiling Linux Kernel

To profile and analyze the Linux kernel modules and functions:

  1. Enable the kernel symbol resolution.

  2. Do one of the following:

Note

Supported OS: Ubuntu 18.04 LTS, Ubuntu 20.04 LTS, and RHEL 8.

8.6.1. Enabling Kernel Symbol Resolution

To attribute the kernel samples to appropriate kernel functions, AMD uProf extracts required information from the /proc/kallsyms file. Exposing the kernel symbol addresses through /proc/ kallsyms requires setting of the appropriate value to the /proc/sys/kernel/kptr_restrict file as follows:

Set the perf_event_paranoid value using one of the following:

$ sudo echo -1 > /proc/sys/kernel/perf_event_paranoid
$ sudo sysctl -w kernel.perf_event_paranoid=-1

Set the kptr_restrict value using one of the following:

  $ sudo echo 0 > /proc/sys/kernel/kptr_restrict

.. code:: console

   $ sudo sysctl -w kernel.kptr_restrict=0

8.6.2. Downloading and Installing Kernel Debug Symbol Packages

On a Linux system, the /boot directory either contains the compressed vmlinux or uncompressed vmlinux image. These kernel files are stripped, have no symbol and debug information. If there is no debug information, AMD uProf will not be able to attribute samples to kernel functions and hence, by default, AMD uProf cannot report kernel functions.

Some Linux distros provide debug symbol files for their kernel which can be used for profiling purposes.

Ubuntu

To download kernel debug info and source code on Ubuntu systems (verified on Ubuntu 18.04.03 LTS):

  1. To trust the debug symbol signing key, execute the following commands:

    // Ubuntu 18.04 LTS and later:
    $ sudo apt install ubuntu-dbgsym-keyring
    // For earlier releases of Ubuntu:
    $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F2EDC64DC5AEE1F6B9C621F0C8CAB6595FDFF622
    
  2. Add the debug symbol repository as follows:

    $ echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse deb http://ddebs.ubuntu.com $(lsb_release -cs)-security main restricted universe multiverse deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" |
    \
    sudo tee -a /etc/apt/sources.list.d/ddebs.list
    
  3. Retrieve the list of available debug symbol packages:

    $ sudo apt update
    
  4. Install the debug symbols for the current kernel version:

    $ sudo apt install --yes linux-image-$(uname -r)-dbgsym
    
  5. Download the kernel source using one of the following methods:

    $ sudo apt source linux-image-unsigned-$(uname -r)
    
    $ sudo apt source linux-image-$(uname -r)
    

After the kernel debug info file is downloaded, it can be found at the default path:

$ /usr/lib/debug/boot/vmlinux-`uname -r`

RHEL

Follow the steps in the Red Hat knowledgebase to download the RHEL kernel debug info.

After the kernel debug info file is downloaded, it can be found at the default path: $ /usr/lib/debug/lib/modules/`uname -r`/vmlinux.

8.6.3. Build Linux Kernel with Debug Symbols

If the debug symbol packages are not available for pre-built kernel images, then analyzing the kernel functions at the source level requires a recompilation of the Linux kernel with debug flag enabled.

8.6.4. Analyzing Hotspots in Kernel Functions

If the debug info for the kernel modules is available, any subsequent CPU performance analysis will attribute the kernel space samples appropriately to [vmlinux] module and display the hot kernel functions. Otherwise, kernel samples will be attributed to [kernel.kallsyms]_text.

During the hotspot analysis, do consider the following:

8.6.5. Linux Kernel Callstack Sampling

In System-wide profile, the callstack samples can be collected for kernel functions. For example, the following command will collect the kernel callstack:

# AMDuProfCLI collect -a -g -o /tmp/usr/bin/stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M -- fork 4 --timeout 20s

To capture the source line of system-module functions use --show-sys-src.

Example

./AMDuProfCLI report --detail --show-sys-src --src-path /usr/src/linux-version/ -i <session_path>

Pass the path to the kernel source files directory using the --src-path option.

8.6.6. Limitations

Note these constraints:

8.7. Profiling FreeBSD Kernel Modules (Pre-release Builds)

To view source code for FreeBSD kernel modules in pre-release (ALPHA, BETA, RC, etc.) builds, AMDuProf requires debug symbol files.

Note

In FreeBSD pre-release builds, debug files are stored in a custom build path instead of the standard path /usr/lib/debug/boot/kernel. Use the --symbol-path option to specify where pre-release kernel debug files are located.

Example

To generate a report with the debug symbol path for pre-release builds:

$ ./AMDuProfCLI report --detail --show-sys-src --symbol-path /path/to/debug-file/ -i <session-path>

For more information on the --symbol-path option, refer to AMDuProfCLI Report Command Options.