Profiling the Ascend AI Processor System

Applicability

Product

Supported (Yes/No)

Atlas A3 Training Series Product

Yes

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Yes

Atlas 200/500 A2 Inference Product

Yes

Atlas Inference Series Product

Yes

Atlas Training Series Product

Yes

Function Description

msprof supports the collection of Ascend AI Processor system data. After the collection, it automatically parses the profile data and flush corresponding files to disks.

Precautions

  • Ensure that an AI task can run properly in the operating environment.
  • Ensure that operations in Before You Start have been completed.

The Python call stack, PyTorch or MindSpore framework layer data cannot be profiled. You can use the framework APIs to profile such data.

Command Example (Ascend EP)

Log in to the environment where the CANN Toolkit package and ops operator package are installed, and run the following command to collect profile data:

msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on --dvpp-profiling=on

For details about the supported options, see Table 1.

Profiling system data of the Ascend AI Processor:

  • If no user application is passed, the tool profiles only the system data of the Ascend AI Processor. In this case, the --output, --sys-period, and --sys-devices options are required.
  • If both the user application and the Ascend AI Processor system data parameters are passed, the --sys-period and --sys-devices options are invalid.
  • For Ascend EP, when you collect network-wide inference profile data using the msprof CLI, if the --llc-profiling, --sys-cpu-profiling, --sys-profiling, and --sys-pid-profiling options are included, no data is profiled for any option except for --sys-cpu-profiling, which collects the TS CPU profile data. However, if no user application is passed, data profiling occurs for all preceding options.
  • For the Atlas A2 Training Series Product/Atlas 800I A2 Inference Product, --instr-profiling is mutually exclusive with --ascendcl, --model-execution, --runtime-api, --hccl, --task-time, --aicpu, --ai-core, --aic-mode, --aic-freq, --aic-metrics, and --l2 and cannot be executed at the same time.
  • For the Atlas A3 Training Series Product, --instr-profiling is mutually exclusive with --ascendcl, --model-execution, --runtime-api, --hccl, --task-time, --aicpu, --ai-core, --aic-mode, --aic-freq, --aic-metrics, and --l2 and cannot be executed at the same time.
  • For the following products, --sys-profiling, --sys-pid-profiling, and --sys-cpu-profiling options cannot be used to collect data of two devices that share the same OS. For example, if a product has [0, 7] devices and they share OSs in groups of 0 and 1, 2 and 3, 4 and 5, and 6 and 7, respectively, then --sys-devices cannot be set to 0 and 1, 2 and 3, 4 and 5, or 6 and 7 at the same time. It can be set to 0, 2, 4, and 6, or 1, 3, 5, and 7.
    • Atlas A3 Training Series Product

After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This generated directory stores the automatically parsed profile data. For details about related result files, see Table 1.

Command Example (Ascend RC)

Log in to the operating environment, go to the /var directory where the msprof tool is located, and run the following commands to collect profile data:

./msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on

For details about the supported options, see Table 1.

Profiling system data of the Ascend AI Processor:

  • If no user application is passed, the tool profiles only the system data of the Ascend AI Processor. In this case, the --output, --sys-period, and --sys-devices options are required.
  • If both the user application and the Ascend AI Processor system data parameters are passed, the --sys-period and --sys-devices options are invalid.

After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. Files in this generated directory cannot be viewed without being parsed. You need to upload the PROF_XXX directory to the development environment where the Toolkit package is installed for data parsing. For details, see Offline Parsing. For details about the generated result files, see Table 1.

Options

Table 1 Options

Option

Description

Applicability

Profile Data File

--sys-period

System profiling period (s). Must be in the range (0, 30*24*3600].

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-devices

Device ID. The value can be all or multiple device IDs separated with commas (,).

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--ai-core

AI Core and AI Vector Core profiling switch, either on (default) or off. For details about the profiling metrics, see op_summary_*.csv.

  • Atlas 200/500 A2 Inference Product: controls AI Core and AI Vector Core profiling.
  • Atlas Inference Series Product: controls AI Core profiling.
  • Atlas Training Series Product: controls AI Core profiling.
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: controls AI Core and AI Vector Core profiling.
  • Atlas A3 Training Series Product: controls AI Core and AI Vector Core profiling.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--aic-mode

AI Core and AI Vector Core profiling mode, either task-based or sample-based. This option must be used in conjunction with --ai-core set to on.

In task-based mode, profile data is collected task by task; in sample-based mode, profile data is collected at a fixed interval.

You are advised to use the sample-based mode to collect Ascend AI Processor system data. If this option is not set, the sample-based mode is used by default.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The AI Core Utilization layer in msprof_*.json and the ai_core_utilization_*.csv file

ai_vector_core_utilization_*.csv

SAMPLE_PMU_TIMELINE in the .db file

SAMPLE_PMU_SUMMARY in the .db file

--aic-freq

Profiling frequency (Hz) in sample-based mode. Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --ai-core set to on.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--aic-metrics

AI Core and AI Vector Core performance metrics to profile. This option must be used in conjunction with --ai-core set to on.

The values include:

  • Atlas 200/500 A2 Inference Product: ArithmeticUtilization, PipeUtilization, Memory, MemoryL0, MemoryUB, ResourceConflictRatio, L2Cache, and PipelineExecuteUtilization (default)
  • Atlas Inference Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
  • Atlas Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
  • Atlas A3 Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
NOTE:
The registers whose data is to be collected can be customized, for example, --aic-metrics=Custom:0x49,0x8,0x15,0x1b,0x64,0x10.
  • The Custom field indicates the customization type. It is set to specific register values in the range of [0x1, 0x6E].
  • A maximum of eight registers can be configured. Separate them with commas (,).
  • The register value can be in hexadecimal or decimal format.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The AI Core Utilization layer in msprof_*.json and the ai_core_utilization_*.csv file

ai_vector_core_utilization_*.csv

SAMPLE_PMU_TIMELINE in the .db file

SAMPLE_PMU_SUMMARY in the .db file

--sys-hardware-mem

Switch for profiling data about the on-chip memory read/write rate, QoS transmission bandwidth, LLC read/write rate/usage/bandwidth (recommended to be used together with --llc-profiling), Acc PMU, SoC transmission bandwidth, and component memory usage. It can be set to on or off (default).

Specific component memory data can only be collected when AI task profile data collection is enabled (that is, passing a user application).

Profiling memory data in the environment where glibc (2.34 or earlier) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The support for different products varies.

On-chip memory read/write rate file

The LLC layer in msprof_*.json and the llc_read_write_*.csv file

The acc_pmu layer in msprof_*.json

The Stars Soc Info layer in msprof_*.json

The NPU MEM layer in msprof_*.json and the npu_mem_*.csv file

The QoS layer in msprof_*.json

npu_module_mem_*.csv (passing a user application is required)

QOS in the .db file

ACC_PMU in the .db file

SOC_BANDWIDTH_LEVEL in the .db file

LLC in the .db file

NPU_MEM in the .db file

NPU_MODULE_MEM in the .db file

HBM in the .db file

DDR in the .db file

--sys-hardware-mem-freq

--sys-hardware-mem profiling frequency (Hz). Defaults to 50. Must be in the range [1, 100].

This option must be used in conjunction with --sys-hardware-mem set to on.

NOTE:

For the following products, you are advised not to increase the profiling frequency after the profiling task is complete. Otherwise, SoC transmission bandwidth data may be lost.

Atlas 200I/500 A2 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas A3 training products/Atlas A3 inference products

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--llc-profiling

LLC profiling events. --sys-hardware-mem must be set to on. The values include:

  • Atlas 200/500 A2 Inference Product:
    • read: read events, that is, the L3 cache read rate.
    • write: write events, that is, the L3 cache write rate. Defaults to read.
  • Atlas Inference Series Product:
    • read: read events, that is, the L3 cache read rate.
    • write: write events, that is, the L3 cache write rate. Defaults to read.
  • Atlas Training Series Product:
    • read: read events, that is, the L3 cache read rate.
    • write: write events, that is, the L3 cache write rate. Defaults to read.
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product:
    • read: read events, that is, the L3 cache read rate.
    • write: write events, that is, the L3 cache write rate. Defaults to read.
  • Atlas A3 Training Series Product:
    • read: read events, that is, the L3 cache read rate.
    • write: write events, that is, the L3 cache write rate. Defaults to read.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-cpu-profiling

CPU (AI CPU, TS CPU, and Ctrl CPU) profiling switch. either on or off (default).

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

ai_cpu_top_function_*.csv

ai_cpu_pmu_events_*.csv

ctrl_cpu_top_function_*.csv

ctrl_cpu_pmu_events_*.csv

ts_cpu_top_function_*.csv

ts_cpu_pmu_events_*.csv

--sys-cpu-freq

CPU profiling frequency (Hz). Defaults to 50. Must be in the range [1, 50].

This option must be used in conjunction with --sys-cpu-profiling set to on.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-profiling

Profiling switch for system CPU usage and system memory, either on or off (default).

NOTE:

After this option is used, the Profiling tool calls the perf tool on the device. The perf tool only collects profile data and cannot obtain other runtime information. The actual risk is low.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

cpu_usage_*.csv

sys_mem_*.csv

--sys-sampling-freq

Profiling frequency (Hz) for system CPU usage and system memory. Defaults to 10. Must be in the range [1, 10].

This option must be used in conjunction with --sys-profiling set to on.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-pid-profiling

Profiling switch for the CPU usage and memory of all processes, either on or off (default).

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

process_cpu_usage_*.csv

process_mem_*.csv

--sys-pid-sampling-freq

Profiling frequency (Hz) for the CPU usage and memory of all processes. Defaults to 10. Must be in the range [1, 10].

This option must be used in conjunction with --sys-pid-profiling set to on.

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-io-profiling

NIC, MAC, and RoCE profiling switch, either on or off (default).

  • Atlas 200/500 A2 Inference Product: supports NIC profiling only in RC scenarios. This option does not take effect in container scenarios.
  • Atlas Training Series Product: supports NIC and RoCE profiling.
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supports NIC, MAC, and RoCE profiling.
  • Atlas A3 Training Series Product: supports NIC, MAC, and RoCE profiling.

Atlas 200/500 A2 Inference Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The NIC layer in msprof_*.json and the nic_*.csv file

The RoCE layer in msprof_*.json and the roce_*.csv file

NIC in the .db file

ROCE in the .db file

NETDEV_STATS in the .db file

--sys-io-sampling-freq

NIC, MAC, and RoCE profiling frequency (Hz). Defaults to 100. Must be in the range [1, 100].

This option must be used in conjunction with --sys-io-profiling set to on.

Atlas 200/500 A2 Inference Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--sys-interconnection-profiling

Profiling switch for PCIe data, HCCS bandwidth, SIO and inter-chip transmission bandwidth, either on or off (default).

  • Atlas Inference Series Product: supports PCIe data profiling.
  • Atlas Training Series Product: supports HCCS and PCIe data profiling.
  • Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supports HCCS, PCIe data, and inter-chip transmission bandwidth profiling.
  • Atlas A3 Training Series Product: supports HCCS, PCIe data, inter-chip transmission bandwidth, and SIO profiling.

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

The PCIe layer in msprof_*.json and the pcie_*.csv file

The HCCS layer in msprof_*.json and the hccs_*.csv file

The Stars Chip Trans layer in msprof_*.json

The SIO layer in msprof_*.json

HCCS in the .db file

PCIE in the .db file

--sys-interconnection-freq

Profiling frequency (Hz) for PCIe data, HCCS bandwidth, SIO and inter-chip transmission bandwidth. Defaults to 50. Must be in the range [1, 50].

This option must be used in conjunction with --sys-interconnection-profiling set to on

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--dvpp-profiling

DVPP profiling switch, either on or off (default).

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

For the Atlas Inference Series Product, parsing of this type of profile data is not supported.

dvpp_*.csv

--dvpp-freq

DVPP profiling frequency (Hz). Defaults to 50. Must be in the range [1, 100].

This option must be used in conjunction with --dvpp-profiling set to on

Atlas 200/500 A2 Inference Product

Atlas Inference Series Product

Atlas Training Series Product

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-

--instr-profiling

AI Core (including AIC and AIV cores) bandwidth and latency profiling switch, either on or off (default).

Specific profile data can only be collected when AI task profiling is enabled (that is, passing a user application) in single-operator scenarios.

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supported only in single-operator scenarios.

Atlas A3 Training Series Product: supported only in single-operator scenarios.

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

biu_group, aic_core_group, and aiv_core_group levels in msprof_*.json

--instr-profiling-freq

AI Core (including AIC and AIV cores) bandwidth and latency profiling cycles. Defaults to 1000. Must be in the range [300, 30000]. Actual AI Core bandwidth and latency profiling frequency = Processor operating frequency/Value of this option. Suppose that the AI Core operating frequency is 5000 Hz and the value of this option is 1000. The profiling frequency is 5 Hz, that is, 5 times per second.

This option can be used only when --instr-profiling is set to on.

Atlas A2 Training Series Product/Atlas 800I A2 Inference Product

Atlas A3 Training Series Product

-