Profiling Runtime Data of AI Tasks

Applicability

Product

Supported (Yes/No)

Atlas A3 Training Series Product

Yes

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Yes

Atlas 200/500 A2 Inference Product

Yes

Atlas Inference Series Product

Yes

Atlas Training Series Product

Yes

Function Description

msprof supports the collection of AI task runtime profile data. After the collection, it automatically parses the profile data and flush corresponding files to disks.

Precautions

  • Ensure that an AI task can run properly in the operating environment.
  • Ensure that operations in Before You Start have been completed.

The Python call stack, PyTorch or MindSpore framework layer data cannot be profiled. You can use the framework APIs to profile such data.

Syntax (Ascend EP)

Log in to the environment where the CANN Toolkit package and ops operator package are installed, and run the following command in any directory:

  • (Recommended) Method 1: Pass the user application or execution script at the end of the msprof command.
    msprof [options] <app> 
  • Method 2: Pass the user application or execution script using the --application option.
    msprof [options] --application=<app> 

Syntax (Ascend RC)

Log in to the operating environment, go to the /var directory where the msprof tool is located, and run the following command:

  • (Recommended) Method 1: Pass the user application or execution script at the end of the msprof command.
    ./msprof [options] <app> 
  • Method 2: Pass the user application or execution script using the --application option.
    ./msprof [options] --application=<app> 

Options

Table 1 Options

Option

Required/Optional

Description

Applicability

Profile Data File

<app>

Required

(Supported only in method 1) User application whose profile data is to be collected. Enter the user application or execution script at the end of the msprof command.

Configuration examples:

msprof --output=/home/projects/output python3 /home/projects/MyApp/out/sample_run.py parameter1 parameter2

msprof --output=/home/projects/output main

msprof --output=/home/projects/output /home/projects/MyApp/out/main

msprof --output=/home/projects/output /home/projects/MyApp/out/main parameter1 parameter2

msprof --output=/home/projects/output /home/projects/MyApp/out/sample_run.sh parameter1 parameter2

NOTE:
  • You are advised not to configure AI tasks in directories owned by other users or directories writable by other users to avoid privilege escalation risks. You are advised not to configure high-risk operations, such as deleting files or directories, changing passwords, and running privilege escalation commands. Do not use pmupload as the application name.
  • This option is required if you collect all profile data, AI task runtime profile data, or msproftx data.

    This option is optional if you collect Ascend AI Processor system data.

    This option is optional if you collect the host-side system data.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--application=<app>

Required

(Supported only in method 2) User application whose profile data is to be collected. You can use this option to pass the user application name and input parameters.

Configuration examples:

Inference scenario: msprof --application="/home/projects/MyApp/out/main parameter1 parameter2 ..."

Training scenario: msprof --application="/home/projects/mindspore/scripts/run_standalone_train.sh parameter1 parameter2 ..."

If abnormal symbols are found in the parameters, the parameters cannot be identified. It is recommended that you employ method 1 for user application passing.

NOTE:
  • You are advised not to configure AI tasks in directories owned by other users or directories writable by other users to avoid privilege escalation risks. You are advised not to configure high-risk operations, such as deleting files or directories, changing passwords, and running privilege escalation commands. Do not use pmupload as the application name.
  • This option is required if you collect all profile data, AI task runtime profile data, or msproftx data.

    This option is optional if you collect Ascend AI Processor system data.

    This option is optional if you collect the host-side system data.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--ascendcl=<ascendcl-value>

Optional

acl profiling switch, either on (default) or off.

You can collect acl profile data, including the synchronous/asynchronous memory copy latencies between the host and devices and between devices.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The CANN_AscendCL layer in msprof_*.json and the api_statistic_*.csv file

COMMUNICATION_TASK_INFO in the .db file

CANN_API in the .db file

--model-execution=<model-execution-value>

Optional

GE model execution profiling switch, either on (default) or off.

NOTE:

This switch will be deprecated in later versions. Use the --task-time switch to control related data collection.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

fusion_op_*.csv

--runtime-api=<runtime-api-value>

Optional

Runtime API profiling switch, either on or off (default). You can collect Runtime API profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The CANN_Runtime layer in msprof_*.json and the api_statistic_*.csv file

MEMCPY_INFO in the .db file

--hccl=<hccl-value>

Optional

HCCL profiling switch, either on or off (default). The data is generated only in multi-rank, multi-node, or cluster scenarios.

NOTE:

This switch will be deprecated in later versions. Use the --task-time switch to control related data collection.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The Communication layer in msprof_*.json and the communication_statistic_*.csv file

api_statistic_*.csv

COMMUNICATION_TASK_INFO in the .db file

COMMUNICATION_OP in the .db file

--task-time=<task-time-value>

Optional

Switch that controls the profiling of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows:

  • l0: collects operator delivery and execution duration data. Compared with l1, l0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on duration data.
  • l1: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data. This option supports collecting the collective communication operator data.
  • on: switch on. This is the default value, delivering the same effect as l1.
  • off: switch off.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The CANN layer in msprof_*.json and the api_statistic_*.csv file

The Ascend Hardware layer in msprof_*.json and the task_time_*.csv file

The Communication layer in msprof_*.json and the communication_statistic_*.csv file

step_trace (iteration trace data)

op_summary_*.csv

op_statistic_*.csv

fusion_op_*.csv

TASK in the .db file

COMPUTE_TASK_INFO in the .db file

COMMUNICATION_TASK_INFO in the .db file

COMMUNICATION_OP in the .db file

--aicpu=<aicpu-value>

Optional

Switch for collecting details, such as computing time and data copy time, about the AI CPU operator, either on or off (default).

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

aicpu_*.csv

dp_*.csv

--ai-core=<aicore-value>

Optional

AI Core profiling switch..

The value can be on or off. When --task-time is set to on or l1, this parameter is set to on by default. When --task-time is set to off or l0, this parameter is set to off by default.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

op_summary_*.csv

TASK_PMU_INFO in the .db file

--aic-mode=<aic-mode-value>

Optional

AI Core profiling mode, either task-based or sample-based. This option must be used in conjunction with --ai-core set to on.

In task-based mode, the profile data is collected by task, while in sample-based mode, the profile data is collected at a fixed interval. You are advised to adopt the task-based mode to collect AI task profile data. If this option is not set, the task-based mode is used by default.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--aic-freq=<aic-freq-value>

Optional

Profiling frequency (Hz) in sample-based mode. Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --ai-core set to on.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--aic-metrics=<aic-metrics-value>

Optional

AI Core metric to trace. This option must be used in conjunction with --ai-core set to on. For details about the profiling metrics, see op_summary_*.csv.

The values include:

  • Atlas 200/500 A2 Inference Product: ArithmeticUtilization, PipeUtilization, Memory, MemoryL0, MemoryUB, ResourceConflictRatio, L2Cache, and PipelineExecuteUtilization (default)
  • Atlas Inference Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
  • Atlas Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
  • Atlas A3 Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
NOTE:
The registers whose data is to be collected can be customized, for example, --aic-metrics=Custom:0x49,0x8,0x15,0x1b,0x64,0x10.
  • The Custom field indicates the customization type. It is set to specific register values in the range of [0x1, 0x6E].
  • A maximum of eight registers can be configured. Separate them with commas (,).
  • The register value can be in hexadecimal or decimal format.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

op_summary_*.csv

TASK_PMU_INFO in the .db file

--sys-hardware-mem=<sys-hardware-mem-value>

Optional

Task-level on-chip memory profiling switch, either on or off (default).

Profiling memory data in the environment where glibc (2.34 or earlier) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The support for different products varies.

The NPU MEM layer in msprof_*.json

npu_module_mem_*.csv

NPU_MODULE_MEM in the .db file

HBM in the .db file

DDR in the .db file

--sys-hardware-mem-freq=<sys-hardware-mem-freq-value>

Optional

Task-level on-chip memory profiling frequency (Hz). Defaults to 50. Must be in the range [1, 100].

This option must be used in conjunction with --sys-hardware-mem set to on.

NOTE:

For the following products, you are advised not to increase the profiling frequency after the profiling task is complete. Otherwise, SoC transmission bandwidth data may be lost.

Atlas 200I/500 A2 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas A3 training products/Atlas A3 inference products

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--l2=<l2-value>

Optional

L2 cache datahit ratio profiling switch, either on or off (default).

  • Atlas 200/500 A2 Inference Product: --aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 from the AI Core.
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: L2 cache hit ratio profiling. --aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 cache from the AI Core.
  • Atlas A3 Training Series Product: L2 cache hit ratio profiling. --aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 cache from the AI Core.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

l2_cache_*.csv

--ge-api=<ge-api-value>

Optional

Switch that controls profiling of the time consumption data of dynamic-shape operators in the host scheduling phase. Related data is generated in the msprof_*.json and api_statistic_*.csv files.

Possible values are as follows:

  • off: switch off. The default value is off.
  • l0: profiles the time consumption data of dynamic-shape operators in the main host scheduling phase to facilitate accurate statistics.
  • l1: profiles finer-grained time consumption data of dynamic-shape operators in the host scheduling phase to provide more comprehensive profile data.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The CANN layer in msprof_*.json and the api_statistic_*.csv file

--task-memory=<task-memory-value>

Optional

Switch for collecting the memory usage of CANN operators. It is used to optimize the memory usage.

  • on: enable
  • off: disable (default value)

In the graph-mode single-operator scenario, the operator memory size and lifecycle information are collected based on the GE component and operator dimensions (the GE component memory is not collected in the single-operator API execution scenario). In the static graph and static subgraph scenarios, the operator memory size and lifecycle information is collected based on the operator dimension.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

Generated in the graph-mode single-operator scenario:

memory_record_*.csv

operator_memory_*.csv

Generated in static graph and static subgraph scenarios:

static_op_mem_*.csv

NPU_OP_MEM in the .db file

Example (Ascend EP)

  1. Log in to the environment where the CANN Toolkit package and ops operator package are installed.
  2. Run the following command in any directory to collect profile data:
    msprof --output=/home/projects/output --ascendcl=on --runtime-api=on --task-time=on --aicpu=on --ai-core=on /home/projects/MyApp/out/main
  3. Find the PROF_XXX directory generated in the directory specified by --output to store the automatically parsed profile data. For details about the result files, see Table 1.

Example (Ascend RC)

  1. Log in to the operating environment.
  2. Go to the /var directory where the msprof tool is located and run the following command to collect profile data:
    ./msprof --output=/home/projects/output --ascendcl=on --runtime-api=on --task-time=on --aicpu=on --ai-core=on /home/projects/MyApp/out/main
  3. Find the PROF_XXX directory generated in the directory specified by --output. Files in this generated directory cannot be viewed without being parsed. You need to upload the PROF_XXX directory to the development environment where the Toolkit package is installed for data parsing. For details, see Offline Parsing. For details about the generated result files, see Table 1.