Profile Data Collection with the acl.json Configuration File

This section describes how to run the executable file of your application project and call the acl.json file to read the profiling configuration in offline inference scenarios. Profile data will be collected automatically. After that, you can parse the collected profile data in the development environment where the Ascend-CANN-Toolkit package is installed and view the parsing results.

For details about parsing operations, see Profile Data Parsing and Export (msprof Command). For details about parsing result files, see Profile Data File References.

For details about building and running an application project, see the CANN AscendCL Application Software Development Guide (C&C++).
In addition, you must call aclInit() to initialize AscendCL and call aclFinalize() to deinitialize AscendCL.

Collection of Raw Profile Data

Configure the acl.json file, and build and run the application project by taking the following steps:

Open the code file of the inference application project where the aclInit() function is located and obtain the path of the acl.json file.

        
             // ACL init
const char *aclConfigPath = "../src/acl.json";
aclError ret = aclInit(aclConfigPath);
if (ret != ACL_ERROR_NONE) {
	ERROR_LOG("acl init failed");
	return FAILED;
}
INFO_LOG("acl init success");

If the acl.json file path is not passed to the aclInit() call, modify the call and pass the path created in Step 2.

Modify the acl.json file in the directory (if the file does not exist, create it in the src directory after project build) and add the related Profiling configuration in the following format.

        
             {
"profiler": {
		"switch": "on",
		"output": "output"
            }
}

**Table 1** Profiler parameters
Parameter	Description	Availability	Profile Data File
switch	Profiling switch, either on or off. If this parameter is not included or is not set to on, profiling is disabled. After profiling is enabled, the AscendCL, Runtime API, and Task Scheduler data is automatically collected.	Atlas 200/300/500 Inference Product Atlas Training Series Product	-
output	Path for dumping profile data to the disk. If this parameter is not set, the profile data is flushed to the directory where the executable file of the application project is located by default. The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "\|", "\\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`". After data collection is complete, directories starting with PROF are generated in this specified directory and will store the raw profile data. The path can be an absolute path or a relative path (relative to the path where commands are executed). An absolute path starts with a slash (/), for example: /home/HwHiAiUser/output A relative path starts with a directory name, for example, output. Ensure that the running user configured during installation has the read and write permissions on the directory specified by this option. If the user does not have the read and write permissions on this directory, the profile data will be stored in the path of the executable file by default (ensure that the running user has the read and write permissions on this default path). This option has a higher priority than ASCEND_WORK_PATH. For details, see the Environment Variables.	Atlas 200/300/500 Inference Product Atlas Training Series Product	-
storage_limit	Maximum size of files that can be stored in a specified disk directory. If the size of profile data files in the disk is about to use up the maximum storage space specified by this option or the total remaining disk space is about to be used up (remaining space ≤ 20 MB), the earliest files in the disk are aged and deleted. The value range is [200, 4294967295], in MB, for example, storage_limit=200MB. By default, this parameter is not set. If this parameter is not set, the default value is 90% of the available space of the disk where the directory for storing profile data files is located.	Atlas Training Series Product	-
aicpu	Whether to collect details about the AI CPU operator, such as the operator execution time and data copy time, The value can be on or off (default).	Atlas 200/300/500 Inference Product Atlas Training Series Product	aicpu_.csv dp_.csv
aic_metrics	AI Core and AI Vector Core events to profile. This parameter takes effect only when task_time is set to on or l1. If task_time is set to l0 or off, collection specified by this parameter is not executed. The value can be set to either of the following: Atlas 200/300/500 Inference Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio Atlas Training Series Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio NOTE: The registers whose data is to be collected can be customized, for example, *"aic_metrics":"Custom:0x49,0x8,0x15,0x1b,0x64,0x10*". The Custom** field indicates the customization type and is set to specific register values. The value range is [0x1, 0x6E]. A maximum of eight registers can be configured. Separate them with commas (,). The register value can be in hexadecimal or decimal format.	Atlas 200/300/500 Inference Product : supports AI Core collection. Atlas Training Series Product : supports AI Core collection.	op_summary_*.csv
l2	L2 cache data sampling switch, either on or off (default).	Atlas Training Series Product	l2_cache_*.csv
hccl	HCCL data collection switch. The data is generated only in multi-card, multi-node, or cluster scenarios. This parameter can be set to on or off in the JSON file. If this parameter is not set in the JSON file, the data is not collected by default. When task_time is set to on, this parameter is automatically set to on. NOTE: This switch will be deprecated in later versions. Use the task_time switch to control related data collection.	Atlas 200/300/500 Inference Product Atlas Training Series Product	The HCCL level in msprof_.json and the hccl_statistic_.csv file api_statistic_*.csv
task_time	Switch that controls collection of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows: on: switch on. The default value is on. off: switch off. l0: collects operator delivery and execution duration data. Compared with l1, l0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on time duration data. l1: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data. The effect is the same as that when this parameter is set to on.	Atlas 200/300/500 Inference Product Atlas Training Series Product	The CANN level in msprof_.json and the api_statistic_.csv file Ascend Hardware level in msprof_.json The HCCL level in msprof_.json and the hccl_statistic_.csv file (The data is generated only in multi-card, multi-node, or cluster scenarios.) step_trace (iteration trace data) op_summary_.csv op_statistic_.csv fusion_op_.csv
ascendcl	AscendCL profile data collection switch, either on (default) or off. You can collect AscendCL profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.	Atlas 200/300/500 Inference Product Atlas Training Series Product	The CANN_AscendCL level in msprof_.json and the api_statistic_.csv file
runtime_api	Runtime API data collection switch, either on (default) or off. You can collect Runtime API profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.	Atlas 200/300/500 Inference Product Atlas Training Series Product	The CANN_Runtime level in msprof_.json and the api_statistic_.csv file
sys_hardware_mem_freq	On-chip memory bandwidth and memory, LLC read/write bandwidth, Acc PMU data, SoC transmission bandwidth, and component memory data collection frequency. NOTE: Sampling memory data in the environment where glibc (2.34 or an earlier version) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version. The value range is [1,100]. The unit is Hz.	Atlas 200/300/500 Inference Product Atlas Training Series Product The support for different products varies.	On-chip memory read/write rate file The LLC of Ai CPU level in msprof_.json and the llc_aicpu_.csv file The LLC of Ctrl CPU level in msprof_.json and the llc_ctrlcpu_.csv file The LLC Bandwidth level in msprof_.json and the llc_bandwidth_.csv file The LLC level in msprof_.json and the llc_read_write_.csv file The NPU MEM level in msprof_.json and the npu_mem_.csv file npu_module_mem_*.csv
llc_profiling	LLC events to profile. Possible values are as follows: Atlas 200/300/500 Inference Product : capacity: LLC capacity of the AI CPU and Ctrl CPU. bandwidth: LLC bandwidth. Defaults to capacity. Atlas Training Series Product : read: read events, that is, the L3 cache read rate. write: write events, that is, the L3 cache write rate. Defaults to read.	Atlas 200/300/500 Inference Product Atlas Training Series Product	LLC of Ai CPU level and llc_aicpu_.csv file in msprof_.json LLC of Ctrl CPU level and llc_ctrlcpu_.csv file in msprof_.json LLC Bandwidth level and llc_bandwidth_.csv file in msprof_.json To collect the data, you need to set sys_hardware_mem_freq.
sys_io_sampling_freq	NIC and RoCE data collection frequency. The value range is [1,100]. The unit is Hz. Atlas 200/300/500 Inference Product : supports NIC collection. Atlas Training Series Product : supports NIC and RoCE collection.	Atlas 200/300/500 Inference Product Atlas Training Series Product	NIC level and nic_.csv file in msprof_.json RoCE level and roce_.csv file in msprof_.json
sys_interconnection_freq	HCCS bandwidth, PCIe, and inter-chip transmission bandwidth data collection frequency. The value range is [1, 50] and the default value is 50. The unit is Hz. Atlas Training Series Product : supports HCCS and PCIe data collection.	Atlas Training Series Product	The PCIe level in msprof_.json and the pcie_.csv file The HCCS level in msprof_.json and the hccs_.csv file
dvpp_freq	DVPP collection frequency. The value range is [1,100]. The unit is Hz.	Atlas 200/300/500 Inference Product Atlas Training Series Product	dvpp_*.csv
host_sys	Host-side profile data collection option. Possible values include: cpu: process CPU usage mem: process memory usage You can select one or more options and separate them with commas (,), for example, "host_sys": "cpu,mem".	Atlas 200/300/500 Inference Product Atlas Training Series Product	The CPU Usage level in msprof_.json and the host_cpu_usage_.csv file The Memory Usage level in msprof_.json and the host_mem_usage_.csv file
host_sys_usage	Host-side system and process CPU and memory data collection option, selected from cpu and mem. You can select one or more options and separate them with commas (,), for example, "host_sys_usage": "cpu,mem".	Atlas 200/300/500 Inference Product Atlas Training Series Product	System CPU usage on the host CPU usage of processes on the host System memory usage on the host Memory usage of processes on the host
host_sys_usage_freq	Host-side system and process CPU and memory data collection frequency. The value range is [1, 50] and the default value is 50. The unit is Hz.	Atlas 200/300/500 Inference Product Atlas Training Series Product	-
msproftx	Switch that controls the msproftx user and upper-layer framework program to output profile data, either on or off (default). Before enabling msproftx, you need to call the msproftx APIs in the program to enable the output of profiling data streams. Call the following two APIs to enable the function of recording the time span of specific events during application execution and writing the profile data file: Use the msprof tool to parse the file and export the profile data. For details about the MindStudio Tools Extension (mstx) APIs and sample code, see mstx API Reference. Profiling AscendCL APIs (msproftx APIs). For details, see Profile Data Collection.	Atlas 200/300/500 Inference Product Atlas Training Series Product	msproftx Data Description

After the acl.json file is configured, rebuild and run the application project. For details, see the CANN AscendCL Application Software Development Guide (C&C++).
output specifies the path for storing collected profile data, as shown in Figure 1.

Figure 1 Profile data of the application project

If the acl.json file already exists, modify the file content and add Profiling configurations. You do not need to rebuild the application project.

Parent topic: Other Collection Methods