Collecting and Parsing Profile Data

This section describes how to use the msprof command line to collect and parse profile data and analyze performance bottlenecks based on the generated result files in inference scenarios. Before using the msprof command line, you need to set up the environment, understand the restrictions and how to use basic parameters by referring to Performance Tuning Tool User Guide .

Collecting, Parsing, and Exporting Profile Data

Log in to the operating environment where the Toolkit software package is installed and run the following command to collect, parse, and export profile data in one click:

msprof --output=/home/HwHiAiUser/profiling_output /home/HwHiAiUser/HIAI_PROJECTS/MyAppname/out/main

**Table 1** Command-line options
Option	Description	Required/Optional
--output	Directory for storing the collected profile data. Defaults to the AI job file directory. The directory path cannot contain the following special characters: "\n", "\f", "\r", "\b", "\t", "\v", and "\u007F".	Optional

After the command is executed, find the PROF_XXX directory generated in the directory specified by output. This directory stores the automatically parsed profile data. (The following shows only the profile data.)

├── device_{id}   // Save the original data. You can ignore this step.
│    └── data
└── mindstudio_profiler_output
      ├── msprof_{timestamp}.json
      ├── step_trace_{timestamp}.json
      ├── xx_*.csv
       ...
      └── README.txt

Access the mindstudio_profiler_output directory to view corresponding profile data files.

For details about the files collected by default, see Table 2.

**Table 2** Profile data files collected by msprof by default
File Name	Description
msprof_*.json	Timeline report.
step_trace_*.json	Iteration trace data, which records the time required for each iteration. This profile data file does not exist in single-operator scenarios.
op_summary_*.csv	AI Core and AI CPU operator data.
op_statistic _*.csv	Number of times that the AI Core and AI CPU operators are called and the time consumption.
step_trace_*.csv	Iteration trace data. This profile data file does not exist in single-operator scenarios.
task_time_*.csv	Task Scheduler data.
fusion_op_*.csv	Operator fusion summary in a model. This profile data file does not exist in single-operator scenarios.
api_statistic_*.csv	Time spent by API execution at the CANN layer.
prof_rule_0_*.json	Optimization suggestions.
Note: The asterisk (*) indicates the timestamp.

To open a timeline .json file, enter chrome://tracing in the address box of Google Chrome, drag the file to the blank space to open it, and press the shortcut keys (w: zoom in; s: zoom out; a: move left; d: move right) on the keyboard to view it. You can view the running timeline information of the current AI job in the file, such as the API call timeline during job running, as shown in Figure 1.
Figure 1 Viewing a .json file
You can directly open a summary .csv file to view it. You can view the software and hardware data during AI job running in summary files, such as the time required by each operator to run on the AI processor software and hardware. You are able to quickly find required information with sorted fields, as shown in Figure 2.
Figure 2 Viewing a .csv file

Performance Analysis

The preceding information shows that there are many profile data files and the analysis methods are flexible. The following introduces several important files and corresponding analysis methods.

View the msprof*.json file to check the running timeline information of an AI job from a holistic perspective and analyze possible bottlenecks.
Figure 3 Example of the msprof*.json file
- Area 1: data at the CANN layer, including the time consumption data of components (such as AscendCL and Runtime) and nodes (operators).
- Area 2: bottom-layer NPU data, including the time consumption data and iteration trace data of each task stream under Ascend Hardware and other Ascend AI Processor system data.
- Area 3: details about each operator and API in a timeline (displayed when you click a timeline color block).
From the above figure, we can roughly analyze the APIs, operators, and task streams that take a long time. Then, find the corresponding delivery relationship based on the arrow directions, analyze the specific bottom-layer tasks that take a lot of time during the inference process, check the time-consuming APIs and operators in area 3, and perform quantitative analysis based on the .csv file to locate the performance bottlenecks.
Analyze the op_statistic_*.csv file to obtain the total calling time and total number of calls of each type of operators, check whether there are any operators with long total execution time, and analyze whether there is any optimization space for these operators.
Figure 4 Example of the op_statistic_*.csv file

You can sort the operators by Total Time to find out which type of operators takes a long time.
Analyze the op_summary_*.csv file to obtain the basic information and time consumption of a specific operator, find the operator with high time consumption, and check whether there is any optimization space for the operator.
Figure 5 Example of the op_summary_*.csv file

The Task Duration field specifies the operator time consumption. You can sort operators by Task Duration to find time-consuming operators, or sort them by Task Type to view the time-consuming operators running on different cores (AI Core and AI CPU).

Parent topic: Model Inference Performance Tuning Suggestions