Using the acl.json Configuration File for Data Profiling
In the offline inference scenario, you can call the acl.json file in the inference application to read the profiling parameters to automatically collect raw profile data.
After that, you can parse the collected raw profile data in the development environment where the CANN Toolkit package and ops operator package are installed, and check the displayed parsing results. For details about parsing operations, see General Description. For details about parsing result files, see Profile Data File References.
This section describes only how to enable data profiling in the inference application. For details about the complete development process of the inference application, see Application Development Guide (C&C++).
Prerequisites
- Before enabling data profiling, ensure that the inference application can be executed properly.
- Call aclInit() and aclFinalize() to complete initialization and deinitialization.
Procedure
Configure the acl.json file, and build and run the application project by taking the following steps:
- Open the code file of the inference application project where the aclInit() function is located and obtain the path of the acl.json file.
1 2 3 4 5 6 7 8
// ACL init const char *aclConfigPath = "../src/acl.json"; aclError ret = aclInit(aclConfigPath); if (ret != ACL_ERROR_NONE) { ERROR_LOG("acl init failed"); return FAILED; } INFO_LOG("acl init success");
If the acl.json file path is not passed to the aclInit() call, modify the call and pass the path created in Step 2.
- Modify the acl.json file in the directory (if the file does not exist, create it in the src directory after project build) and add the related Profiling configuration in the following format.
1 2 3 4 5 6
{ "profiler": { "switch": "on", "output": "output" } }
For the Atlas A2 Training Series Product/Atlas 800I A2 Inference Product, instr_profiling_freq in Table 1 is mutually exclusive with aicpu, aic_metrics, l2, hccl, task_time, ascendcl, and runtime_api, and cannot be executed at the same time.
For the Atlas A3 Training Series Product, instr_profiling_freq in Table 1 is mutually exclusive with aicpu, aic_metrics, l2, hccl, task_time, ascendcl, and runtime_api, and cannot be executed at the same time.
Table 1 Profiler parameters Parameter
Description
Availability
Profile Data File
switch
Profiling switch, either on or off.
If this parameter is not included or is not set to on, profiling is disabled.
After profiling is enabled, the Runtime API and Task Scheduler data is automatically collected.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
-
output
Path for dumping profile data to the disk. If this parameter is not set, the profile data is flushed to the directory where the executable file of the application project is located by default.
The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".
After profiling is complete, directories starting with PROF are generated in this directory to store the raw profile data. The path can be an absolute path or a relative path (relative to the path where commands are executed).- An absolute path starts with a slash (/).
- A relative path starts with a directory name, for example, output.
- Ensure that the running user configured during installation has the read and write permissions on the directory specified by this parameter. If the user does not have the read and write permissions on this directory, the profile data will be stored in the path of the executable file by default (ensure that the running user has the read and write permissions on this default path).
- This parameter has a higher priority than ASCEND_WORK_PATH. For details, see Environment Variables.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
-
storage_limit
Maximum size of files that can be stored in a specified disk directory. If the size of profile data files in the disk is about to use up the maximum storage space specified by this option or the total remaining disk space is about to be used up (remaining space ≤ 20 MB), the earliest files in the disk are aged and deleted.
The value range is [200, 4294967295], in MB, for example, storage_limit=200MB. By default, this option is not set.
If this parameter is not set, the default value is 90% of the available space of the disk where the directory for storing profile data files is located.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
-
aicpu
Whether to profile details about the AI CPU operator, such as the operator execution time and data copy time, either on or off (default).
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
aic_metrics
AI Core and AI Vector Core events to profile. This parameter takes effect only when task_time is set to on or l1. If task_time is set to l0 or off, profiling specified by this parameter will not be executed.
The value can be set to either of the following:
- Atlas 200/500 A2 Inference Product: ArithmeticUtilization, PipeUtilization, Memory, MemoryL0, MemoryUB, ResourceConflictRatio, L2Cache, and PipelineExecuteUtilization (default)
- Atlas Inference Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
- Atlas Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
- Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
- Atlas A3 Training Series Product: ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, ResourceConflictRatio, MemoryAccess, and L2Cache
NOTE:The registers whose data is to be collected can be customized, for example, "aic_metrics":"Custom:0x49,0x8,0x15,0x1b,0x64,0x10".- The Custom field indicates the customization type. It is set to specific register values in the range of [0x1, 0x6E].
- A maximum of eight registers can be configured. Separate them with commas (,).
- The register value can be in hexadecimal or decimal format.
Atlas 200/500 A2 Inference Product: supports AI Core and AI Vector Core profiling.
Atlas Inference Series Product: supports AI Core profiling.
Atlas Training Series Product: supports AI Core profiling.
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supports AI Core and AI Vector Core profiling.
Atlas A3 Training Series Product: supports AI Core and AI Vector Core profiling.
l2
L2 cache profiling switch, either on or off (default).
- Atlas 200/500 A2 Inference Product: aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 cache from the AI Core.
- Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 cache from the AI Core.
- Atlas A3 Training Series Product: aic-metrics=L2Cache is recommended for analyzing the number of hits on L2 cache from the AI Core.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
hccl
Communication data profiling switch. The data is generated only in multi-rank, multi-node, or cluster scenarios.
- This parameter can be set to on or off in the .json file.
- If this parameter is not set in the .json file, the data is not collected by default. When task_time is set to on, this parameter is automatically set to on.NOTE:
This switch will be deprecated in later versions. Use the task_time switch to control related profiling.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The Communication layer in msprof_*.json and the communication_statistic_*.csv file
task_time
Switch that controls the profiling of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows:
- on: switch on. The default value is on.
- off: switch off.
- l0: profiles operator delivery and execution duration data. Compared with l1, l0 does not profile basic operator information, so the performance overhead during profiling is smaller, and this enables more accurate profiling on time consumption data.
- l1: profiles operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive profile data. The effect is the same as that when this parameter is set to on.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The CANN layer in msprof_*.json and the api_statistic_*.csv file
The Ascend Hardware layer in msprof_*.json
The Communication layer in msprof_*.json and the communication_statistic_*.csv file (The data is generated only in multi-rank, multi-node, or cluster scenarios.)
ascendcl
acl profiling switch, either on (default) or off.
You can collect acl profile data, including the synchronous/asynchronous memory copy latencies between the host and devices and between devices.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The CANN_AscendCL layer in msprof_*.json and the api_statistic_*.csv file
runtime_api
Runtime API profiling switch, either on (default) or off.
You can collect Runtime API profile data, including the synchronous/asynchronous memory copy latencies between the host and devices and between devices.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The CANN_Runtime layer in msprof_*.json and the api_statistic_*.csv file
sys_hardware_mem_freq
On-chip memory, QoS, bandwidth and memory, LLC read/write bandwidth, Acc PMU data, SoC transmission bandwidth, and component memory profiling frequency.
The value range is [1, 100], in Hz.
Profiling memory data in the environment where glibc (2.34 or earlier) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version.
NOTE:For the following products, you are advised not to increase the profiling frequency after the profiling task is complete. Otherwise, SoC transmission bandwidth data may be lost.
Atlas 200I/500 A2 inference products Atlas A2 training products /Atlas A2 inference products Atlas A3 training products /Atlas A3 inference products Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The support for different products varies.
On-chip memory read/write rate file
The LLC layer in msprof_*.json and the llc_read_write_*.csv file
The acc_pmu layer in msprof_*.json
The Stars Soc Info layer in msprof_*.json
The NPU MEM layer in msprof_*.json and the npu_mem_*.csv file
llc_profiling
LLC events to profile. Possible values are as follows:
- Atlas 200/500 A2 Inference Product:
- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
- Atlas Inference Series Product:
- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
- Atlas Training Series Product:
- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
- Atlas A2 Training Series Product/Atlas 800I A2 Inference Product:
- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
- Atlas A3 Training Series Product:
- read: read events, that is, the L3 cache read rate.
- write: write events, that is, the L3 cache write rate. Defaults to read.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
To profile the data, you need to set sys_hardware_mem_freq.
sys_io_sampling_freq
NIC and RoCE profiling frequency (Hz). Must be in the range [1, 100].
- Atlas 200/500 A2 Inference Product: supports NIC profiling only in RC scenarios. This option does not take effect in container scenarios.
- Atlas Training Series Product: supports NIC and RoCE profiling.
- Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supports NIC and RoCE profiling.
- Atlas A3 Training Series Product: supports NIC and RoCE profiling.
Atlas 200/500 A2 Inference Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
sys_interconnection_freq
Profiling frequency for PCIe data, HCCS bandwidth, SIO and inter-chip transmission bandwidth.
The value range is [1, 50] and the default value is 50. The unit is Hz.
- Atlas Inference Series Product: supports PCIe data profiling.
- Atlas Training Series Product: supports HCCS and PCIe data profiling.
- Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: supports HCCS, PCIe data, and inter-chip transmission bandwidth profiling.
- Atlas A3 Training Series Product: supports HCCS, PCIe data, inter-chip transmission bandwidth, and SIO profiling.
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The PCIe layer in msprof_*.json and the pcie_*.csv file
dvpp_freq
DVPP profiling frequency.
The value range is [1, 100], in Hz.
Atlas 200/500 A2 Inference Product
For the Atlas Inference Series Product, data can be collected but cannot be parsed.
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
For the Atlas Inference Series Product, parsing of this type of profile data is not supported.
instr_profiling_freq
AI Core and AI Vector bandwidth and latency profiling frequency (Hz). Must be in the range [300, 30000].
NOTE:It is supported only in single-operator scenarios.
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
biu_group, aic_core_group, and aiv_core_group levels in msprof_*.json
host_sys
Host-side profiling option. Possible values include:
- cpu: process CPU usage
- mem: process memory usage
You can select one or more options and separate them with commas (,), for example, "host_sys": "cpu,mem".
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
The CPU Usage layer in msprof_*.json and the host_cpu_usage_*.csv file
The Memory Usage layer in msprof_*.json and the host_mem_usage_*.csv
host_sys_usage
Host-side system and process CPU and memory profiling option, selected from cpu and mem.
You can select one or more options and separate them with commas (,), for example, "host_sys_usage": "cpu,mem".
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
CPU usage of processes on the host
host_sys_usage_freq
Host-side system and process CPU and memory profiling frequency.
The value range is [1, 50] and the default value is 50. The unit is Hz.
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
-
msproftx
Switch that controls the msproftx user and upper-layer framework applications to output profile data, either on or off (default).
Before enabling msproftx, you need to call the msproftx APIs in the applications to enable the output of profiling data streams. Call the following two APIs to enable the function of recording the time span of specific events during application execution and writing the profile data file: Use the msprof tool to parse the file and export the profile data.
- For details about the MindStudio Tools Extension (mstx) APIs, see mstx API Use Case.
- For details about the msproftx APIs, see Profile Data Collection in Application Development Guide (C&C++).
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
- After the acl.json file is configured, rebuild and run the application project. For details, see Application Development Guide (C&C++).
output specifies the path for storing collected profile data, as shown in Figure 1.
If the acl.json file already exists, modify the file content and add Profiling configurations. You do not need to rebuild the application project.
