Collecting AI Job Runtime Profile Data

msprof supports the collection of AI job runtime profile data. After the collection, it automatically parses the profile data and flush corresponding files to disks.

Prerequisites

  • Ensure that an AI project can run properly in the operating environment.
  • Ensure that operations in Before You Start have been completed.

Command Example (Ascend EP)

Log in to the environment where the Ascend-CANN-Toolkit is located, and run the following command to collect profile data:

msprof --output=/home/projects/output --ascendcl=on --runtime-api=on --task-time=on --aicpu=on --ai-core=on /home/projects/MyApp/out/main

For details about the options supported by the command, see Table 1. During collection of the AI job runtime profile data, a user application must be passed.

After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This generated directory stores the automatically parsed profile data. For details about related result files, see Table 1.

Command Example (Ascend RC)

Log in to the operating environment, go to the /var directory where the msprof tool is located, and run the following command to collect profile data:

./msprof --output=/home/projects/output --ascendcl=on --runtime-api=on --task-time=on --ai-core=on /home/projects/MyApp/out/main

For details about the options supported by the command, see Table 1. During collection of the AI job runtime profile data, a user application must be passed.

After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. Files in this generated directory cannot be viewed without being parsed. You need to upload the PROF_XXX directory to the development environment where the Toolkit package is installed for data parsing. For details, see Profile Data Parsing and Export (msprof Command). For details about the generated result files, see Table 1.

Command-line Options

Table 1 Command-line options

Option

Description

Supported Model

Result File

--ascendcl

AscendCL data profiling switch, either on or off, defaulted to on.

You can collect AscendCL profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

The CANN Level in msprof_*.json and the api_statistic_*.csv file

--model-execution

GE model execution data collection switch, either on (default) or off.

NOTE:

This switch will be deprecated in later versions. Use the --task-time switch to control related data collection.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

fusion_op_*.csv

--runtime-api

Runtime API data collection switch, either on or off (default). You can collect Runtime API profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

The CANN_Runtime level in msprof_*.json and the api_statistic_*.csv file

--hccl

HCCL data collection switch, either on or off (default). The data is generated only in multi-card, multi-node, or cluster scenarios.

NOTE:

This switch will be deprecated in later versions. Use the --task-time switch to control related data collection.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

The HCCL level in msprof_*.json and the hccl_statistic_*.csv file

api_statistic_*.csv

--task-time

Switch that controls collection of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows:

  • l0: collects operator delivery and execution duration data. Compared with l1, l0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on duration data.
  • l1: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data. This option supports collecting the collective communication operator data.
  • on: switch on. This is the default value, delivering the same effect as l1.
  • off: switch off.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

The CANN level in msprof_*.json and the api_statistic_*.csv file

The Ascend Hardware level in msprof_*.json and the task_time_*.csv file

The HCCL level in msprof_*.json and the hccl_statistic_*.csv file

step_trace (iteration trace data)

op_summary_*.csv

op_statistic_*.csv

fusion_op_*.csv

--aicpu

Switch for collecting details, such as computing time and data copy time, about the AI CPU operator, either on or off (default).

Atlas 200/300/500 Inference Product

Atlas Training Series Product

aicpu_*.csv

dp_*.csv

--ai-core

AI Core data collection switch..

The value can be on or off. When --task-time is set to on or l1, this parameter is set to on by default. When --task-time is set to off or l0, this parameter is set to off by default.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

op_summary_*.csv

--aic-mode

AI Core hardware data collection mode, either task-based or sample-based. This option must be used in conjunction with --ai-core set to on.

In task-based mode, the profile data is collected by task, while in sample-based mode, the profile data is collected at a fixed interval. You are advised to adopt the task-based mode to collect AI job profile data. If this option is not set, the task-based mode is used by default.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

-

--aic-freq

Sampling frequency (Hz) in sample-based profiling. Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --ai-core set to on.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

-

--aic-metrics

AI Core metric to trace. This option must be used in conjunction with --ai-core set to on.

The values include:

  • Atlas 200/300/500 Inference Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
  • Atlas Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
NOTE:
The registers whose data is to be collected can be customized, for example, --aic-metrics=Custom:0x49,0x8,0x15,0x1b,0x64,0x10.
  • The Custom field indicates the customization type and is set to specific register values. The value range is [0x1, 0x6E].
  • A maximum of eight registers can be configured. Separate them with commas (,).
  • The register value can be in hexadecimal or decimal format.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

op_summary_*.csv

--sys-hardware-mem

Task-level on-chip memory data collection switch, either on or off (default).

NOTE:

Sampling memory data in the environment where glibc (2.34 or an earlier version) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

The support for different products varies.

NPU MEM level in msprof_*.json

npu_module_mem_*.csv

--sys-hardware-mem-freq

Task-level on-chip memory data collection frequency (Hz). Defaults to 50. Must be in the range [1, 100].

This option must be used in conjunction with --sys-hardware-mem set to on.

Atlas 200/300/500 Inference Product

Atlas Training Series Product

-

--l2

L2 cache hit ratio. The value can be on or off. The default value is off.

Atlas Training Series Product

l2_cache_*.csv