Profiling the Ascend AI Processor System
Applicability
Product |
Supported (Yes/No) |
|---|---|
Atlas A3 Training Series Product |
Yes |
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product |
Yes |
Atlas 200/500 A2 Inference Product |
Yes |
Atlas Inference Series Product |
Yes |
Atlas Training Series Product |
Yes |
Function Description
msprof supports the collection of Ascend AI Processor system data. After the collection, it automatically parses the profile data and flush corresponding files to disks.
Precautions
- Ensure that an AI task can run properly in the operating environment.
- Ensure that operations in Before You Start have been completed.
The Python call stack, PyTorch or MindSpore framework layer data cannot be profiled. You can use the framework APIs to profile such data.
Command Example (Ascend EP)
Log in to the environment where the CANN Toolkit package and ops operator package are installed, and run the following command to collect profile data:
msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on --dvpp-profiling=on
For details about the supported options, see Table 1.
Profiling system data of the Ascend AI Processor:
- If no user application is passed, the tool profiles only the system data of the Ascend AI Processor. In this case, the --output, --sys-period, and --sys-devices options are required.
- If both the user application and the Ascend AI Processor system data parameters are passed, the --sys-period and --sys-devices options are invalid.
- For Ascend EP, when you collect network-wide inference profile data using the msprof CLI, if the --llc-profiling, --sys-cpu-profiling, --sys-profiling, and --sys-pid-profiling options are included, no data is profiled for any option except for --sys-cpu-profiling, which collects the TS CPU profile data. However, if no user application is passed, data profiling occurs for all preceding options.
- For the Atlas A2 Training Series Product/Atlas 800I A2 Inference Product, --instr-profiling is mutually exclusive with --ascendcl, --model-execution, --runtime-api, --hccl, --task-time, --aicpu, --ai-core, --aic-mode, --aic-freq, --aic-metrics, and --l2 and cannot be executed at the same time.
- For the Atlas A3 Training Series Product, --instr-profiling is mutually exclusive with --ascendcl, --model-execution, --runtime-api, --hccl, --task-time, --aicpu, --ai-core, --aic-mode, --aic-freq, --aic-metrics, and --l2 and cannot be executed at the same time.
- For the following products, --sys-profiling, --sys-pid-profiling, and --sys-cpu-profiling options cannot be used to collect data of two devices that share the same OS. For example, if a product has [0, 7] devices and they share OSs in groups of 0 and 1, 2 and 3, 4 and 5, and 6 and 7, respectively, then --sys-devices cannot be set to 0 and 1, 2 and 3, 4 and 5, or 6 and 7 at the same time. It can be set to 0, 2, 4, and 6, or 1, 3, 5, and 7.
- Atlas A3 Training Series Product
After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This generated directory stores the automatically parsed profile data. For details about related result files, see Table 1.
Command Example (Ascend RC)
Log in to the operating environment, go to the /var directory where the msprof tool is located, and run the following commands to collect profile data:
./msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on
For details about the supported options, see Table 1.
Profiling system data of the Ascend AI Processor:
- If no user application is passed, the tool profiles only the system data of the Ascend AI Processor. In this case, the --output, --sys-period, and --sys-devices options are required.
- If both the user application and the Ascend AI Processor system data parameters are passed, the --sys-period and --sys-devices options are invalid.
After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. Files in this generated directory cannot be viewed without being parsed. You need to upload the PROF_XXX directory to the development environment where the Toolkit package is installed for data parsing. For details, see Offline Parsing. For details about the generated result files, see Table 1.
Options
Option |
Description |
Applicability |
Profile Data File |
|---|---|---|---|
--sys-period |
System profiling period (s). Must be in the range (0, 30*24*3600]. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-devices |
Device ID. The value can be all or multiple device IDs separated with commas (,). |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--ai-core |
AI Core and AI Vector Core profiling switch, either on (default) or off. For details about the profiling metrics, see op_summary_*.csv.
|
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--aic-mode |
AI Core and AI Vector Core profiling mode, either task-based or sample-based. This option must be used in conjunction with --ai-core set to on. In task-based mode, profile data is collected task by task; in sample-based mode, profile data is collected at a fixed interval. You are advised to use the sample-based mode to collect Ascend AI Processor system data. If this option is not set, the sample-based mode is used by default. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
The AI Core Utilization layer in msprof_*.json and the ai_core_utilization_*.csv file ai_vector_core_utilization_*.csv |
--aic-freq |
Profiling frequency (Hz) in sample-based mode. Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --ai-core set to on. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--aic-metrics |
AI Core and AI Vector Core performance metrics to profile. This option must be used in conjunction with --ai-core set to on. The values include:
NOTE:
The registers whose data is to be collected can be customized, for example, --aic-metrics=Custom:0x49,0x8,0x15,0x1b,0x64,0x10.
|
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
The AI Core Utilization layer in msprof_*.json and the ai_core_utilization_*.csv file ai_vector_core_utilization_*.csv |
--sys-hardware-mem |
Switch for profiling data about the on-chip memory read/write rate, QoS transmission bandwidth, LLC read/write rate/usage/bandwidth (recommended to be used together with --llc-profiling), Acc PMU, SoC transmission bandwidth, and component memory usage. It can be set to on or off (default). Specific component memory data can only be collected when AI task profile data collection is enabled (that is, passing a user application). Profiling memory data in the environment where glibc (2.34 or earlier) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product The support for different products varies. |
On-chip memory read/write rate file The LLC layer in msprof_*.json and the llc_read_write_*.csv file The acc_pmu layer in msprof_*.json The Stars Soc Info layer in msprof_*.json The NPU MEM layer in msprof_*.json and the npu_mem_*.csv file The QoS layer in msprof_*.json npu_module_mem_*.csv (passing a user application is required) SOC_BANDWIDTH_LEVEL in the .db file |
--sys-hardware-mem-freq |
--sys-hardware-mem profiling frequency (Hz). Defaults to 50. Must be in the range [1, 100]. This option must be used in conjunction with --sys-hardware-mem set to on. NOTE:
For the following products, you are advised not to increase the profiling frequency after the profiling task is complete. Otherwise, SoC transmission bandwidth data may be lost. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--llc-profiling |
LLC profiling events. --sys-hardware-mem must be set to on. The values include:
|
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-cpu-profiling |
CPU (AI CPU, TS CPU, and Ctrl CPU) profiling switch. either on or off (default). |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
|
--sys-cpu-freq |
CPU profiling frequency (Hz). Defaults to 50. Must be in the range [1, 50]. This option must be used in conjunction with --sys-cpu-profiling set to on. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-profiling |
Profiling switch for system CPU usage and system memory, either on or off (default). NOTE:
After this option is used, the Profiling tool calls the perf tool on the device. The perf tool only collects profile data and cannot obtain other runtime information. The actual risk is low. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
|
--sys-sampling-freq |
Profiling frequency (Hz) for system CPU usage and system memory. Defaults to 10. Must be in the range [1, 10]. This option must be used in conjunction with --sys-profiling set to on. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-pid-profiling |
Profiling switch for the CPU usage and memory of all processes, either on or off (default). |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
|
--sys-pid-sampling-freq |
Profiling frequency (Hz) for the CPU usage and memory of all processes. Defaults to 10. Must be in the range [1, 10]. This option must be used in conjunction with --sys-pid-profiling set to on. |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-io-profiling |
NIC, MAC, and RoCE profiling switch, either on or off (default).
|
Atlas 200/500 A2 Inference Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
The NIC layer in msprof_*.json and the nic_*.csv file |
--sys-io-sampling-freq |
NIC, MAC, and RoCE profiling frequency (Hz). Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --sys-io-profiling set to on. |
Atlas 200/500 A2 Inference Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--sys-interconnection-profiling |
Profiling switch for PCIe data, HCCS bandwidth, SIO and inter-chip transmission bandwidth, either on or off (default).
|
Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
The PCIe layer in msprof_*.json and the pcie_*.csv file The HCCS layer in msprof_*.json and the hccs_*.csv file The Stars Chip Trans layer in msprof_*.json |
--sys-interconnection-freq |
Profiling frequency (Hz) for PCIe data, HCCS bandwidth, SIO and inter-chip transmission bandwidth. Defaults to 50. Must be in the range [1, 50]. This option must be used in conjunction with --sys-interconnection-profiling set to on |
Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--dvpp-profiling |
DVPP profiling switch, either on or off (default). |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
For the Atlas Inference Series Product, parsing of this type of profile data is not supported. |
--dvpp-freq |
DVPP profiling frequency (Hz). Defaults to 50. Must be in the range [1, 100]. This option must be used in conjunction with --dvpp-profiling set to on |
Atlas 200/500 A2 Inference Product Atlas Inference Series Product Atlas Training Series Product Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |
--instr-profiling |
AI Core (including AIC and AIV cores) bandwidth and latency profiling switch, either on or off (default). Specific profile data can only be collected when AI task profiling is enabled (that is, passing a user application) in single-operator scenarios. Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supported only in single-operator scenarios. Atlas A3 Training Series Product: supported only in single-operator scenarios. |
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
biu_group, aic_core_group, and aiv_core_group levels in msprof_*.json |
--instr-profiling-freq |
AI Core (including AIC and AIV cores) bandwidth and latency profiling cycles. Defaults to 1000. Must be in the range [300, 30000]. Actual AI Core bandwidth and latency profiling frequency = Processor operating frequency/Value of this option. Suppose that the AI Core operating frequency is 5000 Hz and the value of this option is 1000. The profiling frequency is 5 Hz, that is, 5 times per second. This option can be used only when --instr-profiling is set to on. |
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
- |