Profile Data Collection with the acl.json Configuration File

This section describes how to run the executable file of your application project and call the acl.json file to read the profiling configuration in offline inference scenarios. Profile data will be collected automatically. After that, you can parse the collected profile data in the development environment where the Ascend-CANN-Toolkit package is installed and view the parsing results.

For details about parsing operations, see Profile Data Parsing and Export (msprof Command). For details about parsing result files, see Profile Data File References.

Collection of Raw Profile Data

Configure the acl.json file, and build and run the application project by taking the following steps:

  1. Open the code file of the inference application project where the aclInit() function is located and obtain the path of the acl.json file.
    1
    2
    3
    4
    5
    6
    7
    8
    // ACL init
    const char *aclConfigPath = "../src/acl.json";
    aclError ret = aclInit(aclConfigPath);
    if (ret != ACL_ERROR_NONE) {
    	ERROR_LOG("acl init failed");
    	return FAILED;
    }
    INFO_LOG("acl init success");
    

    If the acl.json file path is not passed to the aclInit() call, modify the call and pass the path created in Step 2.

  2. Modify the acl.json file in the directory (if the file does not exist, create it in the src directory after project build) and add the related Profiling configuration in the following format.
    1
    2
    3
    4
    5
    6
    {
    "profiler": {
    		"switch": "on",
    		"output": "output"
                }
    }
    
    Table 1 Profiler parameters

    Parameter

    Description

    Availability

    Profile Data File

    switch

    Profiling switch, either on or off.

    If this parameter is not included or is not set to on, profiling is disabled.

    After profiling is enabled, the AscendCL, Runtime API, and Task Scheduler data is automatically collected.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    -

    output

    Path for dumping profile data to the disk. If this parameter is not set, the profile data is flushed to the directory where the executable file of the application project is located by default.

    The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".

    After data collection is complete, directories starting with PROF are generated in this specified directory and will store the raw profile data. The path can be an absolute path or a relative path (relative to the path where commands are executed).
    • An absolute path starts with a slash (/), for example:

      /home/HwHiAiUser/output

    • A relative path starts with a directory name, for example, output.
    • Ensure that the running user configured during installation has the read and write permissions on the directory specified by this option. If the user does not have the read and write permissions on this directory, the profile data will be stored in the path of the executable file by default (ensure that the running user has the read and write permissions on this default path).
    • This option has a higher priority than ASCEND_WORK_PATH. For details, see the Environment Variables.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    -

    storage_limit

    Maximum size of files that can be stored in a specified disk directory. If the size of profile data files in the disk is about to use up the maximum storage space specified by this option or the total remaining disk space is about to be used up (remaining space ≤ 20 MB), the earliest files in the disk are aged and deleted.

    The value range is [200, 4294967295], in MB, for example, storage_limit=200MB. By default, this parameter is not set.

    If this parameter is not set, the default value is 90% of the available space of the disk where the directory for storing profile data files is located.

    Atlas Training Series Product

    -

    aicpu

    Whether to collect details about the AI CPU operator, such as the operator execution time and data copy time, The value can be on or off (default).

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    aicpu_*.csv

    dp_*.csv

    aic_metrics

    AI Core and AI Vector Core events to profile. This parameter takes effect only when task_time is set to on or l1. If task_time is set to l0 or off, collection specified by this parameter is not executed.

    The value can be set to either of the following:

    • Atlas 200/300/500 Inference Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
    • Atlas Training Series Product : ArithmeticUtilization, PipeUtilization (default), Memory, MemoryL0, MemoryUB, and ResourceConflictRatio
    NOTE:
    The registers whose data is to be collected can be customized, for example, "aic_metrics":"Custom:0x49,0x8,0x15,0x1b,0x64,0x10".
    • The Custom field indicates the customization type and is set to specific register values. The value range is [0x1, 0x6E].
    • A maximum of eight registers can be configured. Separate them with commas (,).
    • The register value can be in hexadecimal or decimal format.

    Atlas 200/300/500 Inference Product : supports AI Core collection.

    Atlas Training Series Product : supports AI Core collection.

    op_summary_*.csv

    l2

    L2 cache data sampling switch, either on or off (default).

    Atlas Training Series Product

    l2_cache_*.csv

    hccl

    HCCL data collection switch. The data is generated only in multi-card, multi-node, or cluster scenarios.

    • This parameter can be set to on or off in the JSON file.
    • If this parameter is not set in the JSON file, the data is not collected by default. When task_time is set to on, this parameter is automatically set to on.
      NOTE:

      This switch will be deprecated in later versions. Use the task_time switch to control related data collection.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The HCCL level in msprof_*.json and the hccl_statistic_*.csv file

    api_statistic_*.csv

    task_time

    Switch that controls collection of the operator delivery and execution durations. Related duration data must be output to the task_time, op_summary, and op_statistic files. Possible configuration values are as follows:

    • on: switch on. The default value is on.
    • off: switch off.
    • l0: collects operator delivery and execution duration data. Compared with l1, l0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on time duration data.
    • l1: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data. The effect is the same as that when this parameter is set to on.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The CANN level in msprof_*.json and the api_statistic_*.csv file

    Ascend Hardware level in msprof_*.json

    The HCCL level in msprof_*.json and the hccl_statistic_*.csv file (The data is generated only in multi-card, multi-node, or cluster scenarios.)

    step_trace (iteration trace data)

    op_summary_*.csv

    op_statistic_*.csv

    fusion_op_*.csv

    ascendcl

    AscendCL profile data collection switch, either on (default) or off.

    You can collect AscendCL profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The CANN_AscendCL level in msprof_*.json and the api_statistic_*.csv file

    runtime_api

    Runtime API data collection switch, either on (default) or off.

    You can collect Runtime API profile data, including the synchronous/asynchronous memory replication latencies between the host and device and between devices.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The CANN_Runtime level in msprof_*.json and the api_statistic_*.csv file

    sys_hardware_mem_freq

    On-chip memory bandwidth and memory, LLC read/write bandwidth, Acc PMU data, SoC transmission bandwidth, and component memory data collection frequency.

    NOTE:

    Sampling memory data in the environment where glibc (2.34 or an earlier version) is installed may trigger a known Bug 19329. This problem can be solved by upgrading the glibc version.

    The value range is [1,100]. The unit is Hz.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The support for different products varies.

    On-chip memory read/write rate file

    The LLC of Ai CPU level in msprof_*.json and the llc_aicpu_*.csv file

    The LLC of Ctrl CPU level in msprof_*.json and the llc_ctrlcpu_*.csv file

    The LLC Bandwidth level in msprof_*.json and the llc_bandwidth_*.csv file

    The LLC level in msprof_*.json and the llc_read_write_*.csv file

    The NPU MEM level in msprof_*.json and the npu_mem_*.csv file

    npu_module_mem_*.csv

    llc_profiling

    LLC events to profile. Possible values are as follows:

    • Atlas 200/300/500 Inference Product :
      • capacity: LLC capacity of the AI CPU and Ctrl CPU.
      • bandwidth: LLC bandwidth. Defaults to capacity.
    • Atlas Training Series Product :
      • read: read events, that is, the L3 cache read rate.
      • write: write events, that is, the L3 cache write rate. Defaults to read.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    LLC of Ai CPU level and llc_aicpu_*.csv file in msprof_*.json

    LLC of Ctrl CPU level and llc_ctrlcpu_*.csv file in msprof_*.json

    LLC Bandwidth level and llc_bandwidth_*.csv file in msprof_*.json

    To collect the data, you need to set sys_hardware_mem_freq.

    sys_io_sampling_freq

    NIC and RoCE data collection frequency. The value range is [1,100]. The unit is Hz.

    • Atlas 200/300/500 Inference Product : supports NIC collection.
    • Atlas Training Series Product : supports NIC and RoCE collection.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    NIC level and nic_*.csv file in msprof_*.json

    RoCE level and roce_*.csv file in msprof_*.json

    sys_interconnection_freq

    HCCS bandwidth, PCIe, and inter-chip transmission bandwidth data collection frequency.

    The value range is [1, 50] and the default value is 50. The unit is Hz.

    • Atlas Training Series Product : supports HCCS and PCIe data collection.

    Atlas Training Series Product

    The PCIe level in msprof_*.json and the pcie_*.csv file

    The HCCS level in msprof_*.json and the hccs_*.csv file

    dvpp_freq

    DVPP collection frequency.

    The value range is [1,100]. The unit is Hz.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    dvpp_*.csv

    host_sys

    Host-side profile data collection option. Possible values include:

    • cpu: process CPU usage
    • mem: process memory usage

    You can select one or more options and separate them with commas (,), for example, "host_sys": "cpu,mem".

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    The CPU Usage level in msprof_*.json and the host_cpu_usage_*.csv file

    The Memory Usage level in msprof_*.json and the host_mem_usage_*.csv file

    host_sys_usage

    Host-side system and process CPU and memory data collection option, selected from cpu and mem.

    You can select one or more options and separate them with commas (,), for example, "host_sys_usage": "cpu,mem".

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    System CPU usage on the host

    CPU usage of processes on the host

    System memory usage on the host

    Memory usage of processes on the host

    host_sys_usage_freq

    Host-side system and process CPU and memory data collection frequency.

    The value range is [1, 50] and the default value is 50. The unit is Hz.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    -

    msproftx

    Switch that controls the msproftx user and upper-layer framework program to output profile data, either on or off (default).

    Before enabling msproftx, you need to call the msproftx APIs in the program to enable the output of profiling data streams. Call the following two APIs to enable the function of recording the time span of specific events during application execution and writing the profile data file: Use the msprof tool to parse the file and export the profile data.

    Atlas 200/300/500 Inference Product

    Atlas Training Series Product

    msproftx Data Description

  3. After the acl.json file is configured, rebuild and run the application project. For details, see the CANN AscendCL Application Software Development Guide (C&C++).

    output specifies the path for storing collected profile data, as shown in Figure 1.

    Figure 1 Profile data of the application project

    If the acl.json file already exists, modify the file content and add Profiling configurations. You do not need to rebuild the application project.