Profiling Quick Start (TensorFlow Training/Online Inference)

In TensorFlow training and online inference scenarios, you are advised to use the APIs provided by TensorFlow Adapter to enable profile data collection and upload the result files to the development environment where the Ascend-CANN-Toolkit is installed to parse the data and analyze performance bottlenecks.

Prerequisites

Ensure that operations in Before You Start have been completed.
The training/online inference script is successfully executed on the Ascend AI Processor.

Collecting, Parsing, and Exporting Profile Data

Modify the training/online inference script and enable profile data collection.

The following uses a script in TensorFlow 1.15 session_run mode as an example.

Use the session configuration options profiling_mode and profiling_options to enable data collection of the Profiling tool. The sample code is as follows:

         
              custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True
# Enable profile data collection.
custom_op.parameter_map["profiling_mode"].b = True
# Profile data collection items
# output is the output path of the collection result.
# task_trace enables task trace collection.
# training_trace enables iteration trace collection. fp_point (start point of the forward propagated operator in iteration traces) and bp_point (end point of the backward propagated operator in iteration traces) are required for collecting iteration traces. You can leave them empty to make the system obtain the values. Manual configuration is required when data collection is abnormal.
custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/profiling_output","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') 
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF  # Disable remapping.
with tf.Session(config=config) as sess:
	sess.run()

The preceding items are the most basic collection items. For other collection requirements, see Profile Data Collection with TensorFlow Framework APIs.
The methods of modifying the Estimator and Keras scripts are slightly different. The methods of modifying the manual migration script and automatic migration script are also slightly different. For details, see Profile Data Collection with TensorFlow Framework APIs.

Run the training/online inference script again to collect profile data during the training.
After the training/online inferece is complete, the PROF_XXX folder is generated in the directory specified by output to store the collected raw profile data. The data can be viewed only after being parsed by the msprof parsing tool.

Run the msprof command to parse and export the profile data.

msprof --export=on --output=/home/HwHiAiUser/profiling_output/PROF_XXX

--output indicates the path for storing the profile data files, which is set during profile data collection.

After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This directory stores the collected and automatically parsed profile data. The directory structure is as follows (only profile data is displayed):

        
             ├── host   // Save the original data. You can ignore this step.
│    └── data
├── device_{id}   // Save the original data. You can ignore this step.
│    └── data
└── mindstudio_profiler_output
      ├── msprof_{timestamp}.json
      ├── step_trace_{timestamp}.json
      ├── xx_*.csv
       ...
      └── README.txt

Access the mindstudio_profiler_output directory to view corresponding profile data files.

For details about the files collected by default, see Table 1.

**Table 1** Profile data files collected by msprof by default
File Name	Description
msprof_*.json	Timeline report.
step_trace_*.json	Iteration trace data, which records the time required for each iteration. This profile data file does not exist in single-operator scenarios.
op_summary_*.csv	AI Core and AI CPU operator data.
op_statistic _*.csv	Number of times that the AI Core and AI CPU operators are called and the time consumption.
step_trace_*.csv	Iteration trace data. This profile data file does not exist in single-operator scenarios.
task_time_*.csv	Task Scheduler data.
fusion_op_*.csv	Operator fusion summary in a model. This profile data file does not exist in single-operator scenarios.
api_statistic_*.csv	Time spent by API execution at the CANN layer.
Note: The asterisk (*) indicates the timestamp.

To open a timeline .json file, enter chrome://tracing in the address box of Google Chrome, drag the file to the blank space to open it, and press the shortcut keys (w: zoom in; s: zoom out; a: move left; d: move right) on the keyboard to view it. You can view the running timeline information of the current AI job in the file, such as the API call timeline during job running, as shown in Figure 1.
Figure 1 Viewing a .json file
You can directly open a summary .csv file to view it. You can view the software and hardware data during AI job running in summary files, such as the time required by each operator to run on the AI processor software and hardware. You are able to quickly find required information with sorted fields, as shown in Figure 2.
Figure 2 Viewing a .csv file

Performance Analysis

The preceding information shows that there are many profile data files and the analysis methods are flexible. The following introduces several important files and corresponding analysis methods.

Analyze the step_trace_*.csv file to obtain the iteration trace data. This file records the duration of each iteration.
Figure 3 Example of the step_trace_*.csv file
The main fields are as follows:
- Iteration Time: computation time of an iteration, including the time of the FP/BP and Grad Refresh phases.
- FP to BP Time: computation time of forward and backward propagation on the network.
- Iteration Refresh: iteration trailing time.
- Data Aug Bound: interval between two adjacent iterations.
Analyze the op_statistic_*.csv file to obtain the total calling time and total number of calls of each type of operators, check whether there are any operators with long total execution time, and analyze whether there is any optimization space for these operators.
Figure 4 Example of the op_statistic_*.csv file

You can sort the operators by Total Time to find out which type of operators takes a long time.
Analyze the op_summary_*.csv file to obtain the basic information and time consumption of a specific operator, find the operator with high time consumption, and check whether there is any optimization space for the operator.
Figure 5 Example of the op_summary_*.csv file

The Task Duration field specifies the operator time consumption. You can sort operators by Task Duration to find time-consuming operators, or sort them by Task Type to view the time-consuming operators running on different cores (such as AI Core and AI CPU).