Collecting Ascend AI Processor System Data
msprof supports the collection of Ascend AI Processor system data. After the collection, it automatically parses the profile data and flush corresponding files to disks.
Command Example (Ascend EP)
Log in to the environment where the Ascend-CANN-Toolkit is located, and run the following commands to collect profile data:
msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on --dvpp-profiling=on
For details about the options supported by the command, see Table 1. When collecting the Ascend AI Processor system data, if you do not pass a user application, only the Ascend AI Processor system data is collected, and in this case, the --output, --sys-period, and --sys-devices options are mandatory. If you pass the user application and Ascend AI Processor system data parameters at the same time, the --sys-period and --sys-devices options do not take effect.
- For Ascend EP, when you collect network-wide inference profile data using the msprof command line, if the --llc-profiling, --sys-cpu-profiling, --sys-profiling, and --sys-pid-profiling options are included, only the TS CPU profile data is generated by the --sys-cpu-profiling option and no data is generated by other options. However, if no user application is passed, profile data corresponding to each of the preceding options can be generated.
After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This generated directory stores the automatically parsed profile data. For details about related result files, see Table 1.
Command Example (Ascend RC)
Log in to the operating environment, go to the /var directory where the msprof tool is located, and run the following commands to collect profile data:
./msprof --output=/home/projects/output --sys-devices=<ID> --sys-period=<period> --ai-core=on --sys-hardware-mem=on --sys-cpu-profiling=on --sys-profiling=on --sys-pid-profiling=on
For details about the options supported by the command, see Table 1. When collecting the Ascend AI Processor system data, if you do not pass a user application, only the Ascend AI Processor system data is collected, and in this case, the --output, --sys-period, and --sys-devices options are mandatory. If you pass the user application and Ascend AI Processor system data parameters at the same time, the --sys-period and --sys-devices options do not take effect.
After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. Files in this generated directory cannot be viewed without being parsed. You need to upload the PROF_XXX directory to the development environment where the Toolkit package is installed for data parsing. For details, see Profile Data Parsing and Export (msprof Command). For details about the generated result files, see Table 1.
Command-line Options
Option |
Description |
Supported Model |
Result File |
|---|---|---|---|
--sys-period |
System sampling period (s). Must be in the range (0, 30 x 24 x 3600]. |
- |
|
--sys-devices |
Device ID. The value can be all or multiple device IDs separated with commas (,). |
- |
|
--ai-core |
AI Core and AI Vector Core data collection switch, either on (default) or off.
|
- |
|
--aic-mode |
AI Core and AI Vector Core hardware data collection mode, either task-based or sample-based. This option must be used in conjunction with --ai-core set to on. In task-based mode, profile data is collected task by task; in sample-based mode, profile data is collected at a fixed interval. You are advised to use the sample-based mode to collect Ascend AI Processor system data. If this option is not set, the sample-based mode is used by default. |
The AI Core Utilization level in msprof_*.json and the ai_core_utilization_*.csv file |
|
--aic-freq |
Sampling frequency (Hz) in sample-based profiling. Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --ai-core set to on. |
- |
|
--aic-metrics |
AI Core and AI Vector Core performance metrics to profile. This option must be used in conjunction with --ai-core set to on. The values include:
NOTE:
The registers whose data is to be collected can be customized, for example, --aic-metrics=Custom:0x49,0x8,0x15,0x1b,0x64,0x10.
|
The AI Core Utilization level in msprof_*.json and the ai_core_utilization_*.csv file |
|
--sys-hardware-mem |
Switch for profiling data about the on-chip memory read/write rate, LLC read/write rate/usage/bandwidth (recommended to be used together with --llc-profiling), Acc PMU, SoC transmission bandwidth, and component memory usage. It can be set to on or off (default). Specific component memory data can only be collected when AI job profile data collection is enabled (that is, passing a user application). |
The support for different products varies. |
On-chip memory read/write rate file The LLC of Ai CPU level in msprof_*.json and the llc_aicpu_*.csv file The LLC of Ctrl CPU level in msprof_*.json and the llc_ctrlcpu_*.csv file The LLC Bandwidth level in msprof_*.json and the llc_bandwidth_*.csv file The LLC level in msprof_*.json and the llc_read_write_*.csv file The NPU MEM level in msprof_*.json and the npu_mem_*.csv file npu_module_mem_*.csv (passing a user application is required) |
--sys-hardware-mem-freq |
--sys-hardware-mem sampling frequency (Hz). Defaults to 50. Must be in the range [1, 100]. This option must be used in conjunction with --sys-hardware-mem set to on. |
- |
|
--llc-profiling |
LLC profiling events. --sys-hardware-mem must be set to on. The values include:
|
The LLC of Ai CPU level in msprof_*.json and the llc_aicpu_*.csv file The LLC of Ctrl CPU level in msprof_*.json and the llc_ctrlcpu_*.csv file The LLC Bandwidth level in msprof_*.json and the llc_bandwidth_*.csv - |
|
--sys-cpu-profiling |
CPU (AI CPU, Ctrl CPU, and TS CPU) data collection switch, either on or off (default). |
||
--sys-cpu-freq |
CPU sampling frequency (Hz). Defaults to 50. Must be in the range [1, 50]. This option must be used in conjunction with --sys-cpu-profiling set to on. |
- |
|
--sys-profiling |
Data collection switch for system CPU usage and system memory, either on or off (default). |
||
--sys-sampling-freq |
Sampling frequency (Hz) for profiling system CPU usage and system memory. Defaults to 10. Must be in the range [1, 10]. This option must be used in conjunction with --sys-profiling set to on. |
- |
|
--sys-pid-profiling |
Data collection switch for the CPU usage and memory of all processes, either on or off (default). |
||
--sys-pid-sampling-freq |
Sampling frequency (Hz) for the CPU usage and memory of all processes. Defaults to 10. Must be in the range [1, 10]. This option must be used in conjunction with --sys-pid-profiling set to on. |
- |
|
--sys-io-profiling |
NIC and RoCE data collection switch, either on or off (default).
|
||
--sys-io-sampling-freq |
NIC and RoCE sampling frequency (Hz). Defaults to 100. Must be in the range [1, 100]. This option must be used in conjunction with --sys-io-profiling set to on. |
- |
|
--sys-interconnection-profiling |
PCIe, HCCS bandwidth, and data collection switch, and inter-chip transmission bandwidth data collection switch, either on or off (default).
|
||
--sys-interconnection-freq |
HCCS bandwidth, PCIe, and inter-chip transmission bandwidth data collection frequency. Defaults to 50. Must be in the range [1, 50]. The unit is Hz. This option must be used in conjunction with --sys-interconnection-profiling set to on |
- |
|
--dvpp-profiling |
DVPP data collection switch, either on or off (default). |
||
--dvpp-freq |
DVPP sampling frequency (Hz). Defaults to 50. Must be in the range [1, 100]. This option must be used in conjunction with --dvpp-profiling set to on |
- |