Using msprof-analyze to Analyze Profile Data
Prerequisites
- Complete Profile Data Collection to obtain the profile data of the NPU environment.
- Run the following command to install msprof-analyze:
pip3 install msprof-analyze
If the following information is displayed, the installation is successful:
Successfully installed msprof-analyze-{version}
Performing Advisor Analysis
The advisor function of msprof-analyze is to analyze the profile data collected by Ascend PyTorch Profiler and output performance tuning suggestions. The command is as follows:
msprof-analyze advisor all -d $HOME/profiling_data/
The analysis result is output to the execution terminal, and the mstt_advisor_{timestamp}.html and mstt_advisor_{timestamp}.xlsx files are generated in the command execution directory for users to view.
The analysis result of the advisor tool mainly provides expert suggestions on possible performance problems. The following is an example:
Comparing Performance Using compare_tools
compare_tools is used to analyze the deterioration when the performance deteriorates or the NPU profile data is different after the PyTorch training project is migrated from the GPU to the NPU. The operations are as follows:
- Copy the profile data in the GPU environment to the NPU environment.
- Perform performance comparison.
msprof-analyze compare -d $HOME/profiling_data/ -bp HOME/gpu/profiling_data/ --output_path=./compare_result/profiler_compare
The analysis result is output to the execution terminal, and the performance_comparison_result_{timestamp}.xlsx file is generated in the command execution directory for users to view.
The performance comparison tool divides the overall performance into training duration and memory usage. The training duration can be divided into three dimensions: operator (including operators and nn.Module), communication, and scheduling. The overall indicators are displayed on the screen to help users demarcate the deterioration. In addition, the tool generates performance_comparison_result_{timestamp}.xlsxs to display the execution time, communication time, and memory usage of each operator. Users can filter out deteriorated operators by checking whether the value in the DIFF column is greater than 0. For details, see "Comparison Result Description" in compare_tools.
(Optional) Performing Cluster Analysis
The data in this case is not in the cluster scenario and cannot be analyzed by cluster_analyse. The following provides only the operation guide.
In cluster scenarios, cluster_analyse is used to analyze cluster data. Currently, the communicator-based iteration time consumption, communication time, and communication matrix are mainly analyzed to locate slow cards, slow nodes, and slow links. The operations are as follows:
- Copy the profile data in the GPU environment to the NPU environment.
- Perform performance comparison.
msprof-analyze cluster -d $HOME/profiling_data/ -m all
The analysis result is generated in the cluster_analysis_output folder in the directory specified by the -d option, and the cluster_step_trace_time.csv, cluster_communication_matrix.json, and cluster_communication.json files are generated.
For details, see cluster_analyse.
The deliverables of cluster_analyse are displayed on MindStudio Insight. For details, see Using MindStudio Insight to Display Profile Data.