Profile Data Collection with the acl.json Configuration File
This section describes how to run the executable file of your application project and call the acl.json file to read the profiling configuration in offline inference scenarios. Profile data will be collected automatically. After that, you can parse the collected profile data in the development environment where the Ascend-CANN-Toolkit package is installed and view the parsing results.
For details about parsing operations, see Profile Data Parsing and Export (msprof Command). For details about parsing result files, see Profile Data File References.
Collection of Raw Profile Data
Configure the acl.json file, and build and run the application project by taking the following steps:
- Open the code file of the inference application project where the aclInit() function is located and obtain the path of the acl.json file.
1 2 3 4 5 6 7 8
// ACL init const char *aclConfigPath = "../src/acl.json"; aclError ret = aclInit(aclConfigPath); if (ret != ACL_ERROR_NONE) { ERROR_LOG("acl init failed"); return FAILED; } INFO_LOG("acl init success");
If the acl.json file path is not passed to the aclInit() call, modify the call and pass the path created in Step 2.
- Modify the acl.json file in the directory (if the file does not exist, create it in the src directory after project build) and add the related Profiling configuration in the following format.
1 2 3 4 5 6
{ "profiler": { "switch": "on", "output": "output" } }
Table 1 Profiler parameters Parameter
Description
Availability
Profile Data File
switch
Profiling switch, either on or off.
If this parameter is not included or is not set to on, profiling is disabled.
After profiling is enabled, the AscendCL, Runtime API, and Task Scheduler data is automatically collected.
Atlas 200/300/500 Inference Product Atlas Training Series Product -
output
Path for dumping profile data to the disk. If this parameter is not set, the profile data is flushed to the directory where the executable file of the application project is located by default.
The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".
After data collection is complete, directories starting with PROF are generated in this specified directory and will store the raw profile data. The path can be an absolute path or a relative path (relative to the path where commands are executed).- An absolute path starts with a slash (/), for example:
- A relative path starts with a directory name, for example, output.
- Ensure that the running user configured during installation has the read and write permissions on the directory specified by this option. If the user does not have the read and write permissions on this directory, the profile data will be stored in the path of the executable file by default (ensure that the running user has the read and write permissions on this default path).
- This option has a higher priority than ASCEND_WORK_PATH. For details, see the Environment Variables.
Atlas 200/300/500 Inference Product Atlas Training Series Product -
storage_limit
Maximum size of files that can be stored in a specified disk directory. If the size of profile data files in the disk is about to use up the maximum storage space specified by this option or the total remaining disk space is about to be used up (remaining space ≤ 20 MB), the earliest files in the disk are aged and deleted.
The value range is [200, 4294967295], in MB, for example, storage_limit=200MB. By default, this parameter is not set.
If this parameter is not set, the default value is 90% of the available space of the disk where the directory for storing profile data files is located.
Atlas Training Series Product -
aicpu
Whether to collect details about the AI CPU operator, such as the operator execution time and data copy time, The value can be on or off (default).
Atlas 200/300/500 Inference Product Atlas Training Series Product aic_metrics
AI Core and AI Vector Core events to profile. This parameter takes effect only when task_time is set to on or l1. If task_time is set to l0 or off, collection specified by this parameter is not executed.
The value can be set to either of the following:
Atlas 200/300/500 Inference Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatioAtlas Training Series Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
NOTE:The registers whose data is to be collected can be customized, for example, "aic_metrics":"Custom:0x49,0x8,0x15,0x1b,0x64,0x10".- The Custom field indicates the customization type and is set to specific register values. The value range is [0x1, 0x6E].
- A maximum of eight registers can be configured. Separate them with commas (,).
- The register value can be in hexadecimal or decimal format.
Atlas 200/300/500 Inference Product : supports AI Core collection.Atlas Training Series Product : supports AI Core collection.l2
L2 cache data sampling switch, either on or off (default).
Atlas Training Series Product hccl
HCCL data collection switch. The data is generated only in multi-card, multi-node, or cluster scenarios.
- This parameter can be set to on or off in the JSON file.
- If this parameter is not set in the JSON file, the data is not collected by default. When task_time is set to on, this parameter is automatically set to on.
NOTE:
This switch will be deprecated in later versions. Use the task_time switch to control related data collection.
Atlas 200/300/500 Inference Product Atlas Training Series Product The HCCL level in msprof_*.json and the hccl_statistic_*.csv file
task_time
Switch that controls collection of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows:
- on: switch on. The default value is on.
- off: switch off.
- l0: collects operator delivery and execution duration data. Compared with l1, l0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on time duration data.
- l1: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data. The effect is the same as that when this parameter is set to on.
Atlas 200/300/500 Inference Product Atlas Training Series Product The CANN level in msprof_*.json and the api_statistic_*.csv file
Ascend Hardware level in msprof_*.json
The HCCL level in msprof_*.json and the hccl_statistic_*.csv file (The data is generated only in multi-card, multi-node, or cluster scenarios.)
ascendcl
AscendCL profile data collection switch, either on (default) or off.
You can collect AscendCL profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.
Atlas 200/300/500 Inference Product Atlas Training Series Product The CANN_AscendCL level in msprof_*.json and the api_statistic_*.csv file
runtime_api
Runtime API data collection switch, either on (default) or off.
You can collect Runtime API profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.
Atlas 200/300/500 Inference Product Atlas Training Series Product The CANN_Runtime level in msprof_*.json and the api_statistic_*.csv file
sys_hardware_mem_freq
On-chip memory bandwidth and memory, LLC read/write bandwidth, Acc PMU data, SoC transmission bandwidth, and component memory data collection frequency.
NOTE:The value range is [1,100]. The unit is Hz.
Atlas 200/300/500 Inference Product Atlas Training Series Product The support for different products varies.
On-chip memory read/write rate file
The LLC of Ai CPU level in msprof_*.json and the llc_aicpu_*.csv file
The LLC of Ctrl CPU level in msprof_*.json and the llc_ctrlcpu_*.csv file
The LLC Bandwidth level in msprof_*.json and the llc_bandwidth_*.csv file
The LLC level in msprof_*.json and the llc_read_write_*.csv file
The NPU MEM level in msprof_*.json and the npu_mem_*.csv file
llc_profiling
LLC events to profile. Possible values are as follows:
Atlas 200/300/500 Inference Product :- capacity: LLC capacity of the AI CPU and Ctrl CPU.
- bandwidth: LLC bandwidth. Defaults to capacity.
Atlas Training Series Product :- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
Atlas 200/300/500 Inference Product Atlas Training Series Product LLC of Ai CPU level and llc_aicpu_*.csv file in msprof_*.json
LLC of Ctrl CPU level and llc_ctrlcpu_*.csv file in msprof_*.json
LLC Bandwidth level and llc_bandwidth_*.csv file in msprof_*.json
To collect the data, you need to set sys_hardware_mem_freq.
sys_io_sampling_freq
NIC and RoCE data collection frequency. The value range is [1,100]. The unit is Hz.
Atlas 200/300/500 Inference Product : supports NIC collection.Atlas Training Series Product : supports NIC and RoCE collection.
Atlas 200/300/500 Inference Product Atlas Training Series Product sys_interconnection_freq
HCCS bandwidth, PCIe, and inter-chip transmission bandwidth data collection frequency.
The value range is [1, 50] and the default value is 50. The unit is Hz.
Atlas Training Series Product : supports HCCS and PCIe data collection.
Atlas Training Series Product dvpp_freq
DVPP collection frequency.
The value range is [1,100]. The unit is Hz.
Atlas 200/300/500 Inference Product Atlas Training Series Product host_sys
Host-side profile data collection option. Possible values include:
- cpu: process CPU usage
- mem: process memory usage
You can select one or more options and separate them with commas (,), for example, "host_sys": "cpu,mem".
Atlas 200/300/500 Inference Product Atlas Training Series Product The CPU Usage level in msprof_*.json and the host_cpu_usage_*.csv file
The Memory Usage level in msprof_*.json and the host_mem_usage_*.csv file
host_sys_usage
Host-side system and process CPU and memory data collection option, selected from cpu and mem.
You can select one or more options and separate them with commas (,), for example, "host_sys_usage": "cpu,mem".
Atlas 200/300/500 Inference Product Atlas Training Series Product CPU usage of processes on the host
host_sys_usage_freq
Host-side system and process CPU and memory data collection frequency.
The value range is [1, 50] and the default value is 50. The unit is Hz.
Atlas 200/300/500 Inference Product Atlas Training Series Product -
msproftx
Switch that controls the msproftx user and upper-layer framework program to output profile data, either on or off (default).
Before enabling msproftx, you need to call the msproftx APIs in the program to enable the output of profiling data streams. Call the following two APIs to enable the function of recording the time span of specific events during application execution and writing the profile data file: Use the msprof tool to parse the file and export the profile data.
- For details about the MindStudio Tools Extension (mstx) APIs and sample code, see mstx API Reference.
- Profiling AscendCL APIs (msproftx APIs). For details, see Profile Data Collection.
Atlas 200/300/500 Inference Product Atlas Training Series Product - After the acl.json file is configured, rebuild and run the application project. For details, see the CANN AscendCL Application Software Development Guide (C&C++).
output specifies the path for storing collected profile data, as shown in Figure 1.
If the acl.json file already exists, modify the file content and add Profiling configurations. You do not need to rebuild the application project.
