Migration Analysis
The PyTorch Analyse tool provides analysis scripts to help users analyze the APIs, third-party library suites, affinity APIs, and dynamic shapes in PyTorch training scripts on GPUs before migration. For details, see Table 1.
|
Analysis Mode |
Analysis Script |
Analysis Result |
Optimization Suggestion |
|---|---|---|---|
|
Third-party library suite analysis mode |
You need to provide the source code of a third-party library suite. |
Quickly obtain unsupported third-party library APIs and CUDA information in the source code.
NOTE:
Third-party library APIs refer to functions in the third-party library code. If a function uses an unsupported Torch operator or custom CUDA operator in its body, then this function represents an unsupported API in the third-party library. If other functions in the third-party library call these unsupported APIs, then these functions also represent unsupported APIs. |
- |
|
API support analysis mode |
You need to provide the PyTorch training script to be analyzed. |
Quickly obtain information about Torch APIs and CUDA APIs that are not supported in the training script. |
Output expert suggestions on API precision and performance tuning in the training script. |
|
Dynamic shape analysis mode |
Quickly obtain the dynamic shape information contained in the training script. |
- |
|
|
Affinity API analysis mode |
Quickly obtain information about affinity APIs that can be replaced in the training script. |
- |
Prerequisites
pip3 install pandas # The pandas version must be 1.2.4 or later. pip3 install libcst # Python syntax tree parser, which is used to parse Python files. pip3 install prettytable # This dependency is used to visualize data in charts. pip3 install jedi # Mandatory for third-party library suite and affinity API analysis.
Starting an Analysis Task
- Go to the path where the analysis tool is located.
cd ${INSTALL_DIR}/cann/tools/ms_fmk_transplt/ # Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
- Start an analysis task.Run the following command to start an analysis task based on the configuration options in Table 2:
./pytorch_analyse.sh -i /home/xxx/analysis -o /home/xxx/analysis_output -v 2.1.0 [-m torch_apis] # /home/xxx/analysis indicates the path of the script to be analyzed, /home/xxx/analysis_output indicates the analysis result output path, and 2.1.0 indicates the framework version of the script to be analyzed. torch_apis indicates the analysis mode.
[] encloses optional parameters, which can be omitted in actual use. If the analysis mode specified by -m/--mode is dynamic_shape, you need to modify the training script according to the description in Training Configuration after the analysis task is complete to obtain the dynamic shape analysis report.
Table 2 Parameter description Parameter
Description
Example Value
-i
--input
- Path to the folder where the script file to be analyzed is located or the folder where the source code of the third-party library suite is located.
- Mandatory.
/home/xxx/analysis
-o
--output
- Output path of the analysis result file.
- The xxxx_analysis folder is generated in this path.
- Mandatory.
NOTE:
Ensure that the output path of the analysis result file exists before running the tool. Otherwise, the tool displays an error message.
/home/xxx/analysis_output
-v
--version
- PyTorch version of the script file to be analyzed or the source code of the third-party library suite.
- Mandatory.
- 1.11
- 2.1.0
- 2.2.0
- 2.3.1
- 2.4.0
- 2.5.1
- 2.6.0
NOTE:
In automatic migration mode, PyTorch 1.11.0 does not support
Atlas A3 training products /Atlas A3 inference products .
-m
--mode
- Analysis mode. Currently, the torch_apis (API support analysis), third_party (third-party library suite analysis), affinity_apis (affinity API analysis), and dynamic_shape (dynamic shape analysis) modes are supported.
- Optional.
- torch_apis (default)
- third_party
- affinity_apis
- dynamic_shape
-env
--env-path
- Path of the PYTHONPATH environment variable added during analysis. This option takes effect only after jedi is installed.
- Path of the third-party library to be analyzed. The third-party library APIs that are not supported by the current script are analyzed.
- Optional.
/home/xxx/transformers/src /home/xxx/transformers/utils
Use spaces to separate multiple file paths.
-api
--api-files
- Result file of analysis on APIs not supported by the third-party library.
- Optional.
NOTE:
If the third-party library contains unsupported APIs and the custom function calls an unsupported torch API, you can use the torch API analysis function.
- Use the third_party (third-party library suite analysis) analysis function in -m to obtain the list of APIs (CSV file) that cannot be migrated in the third-party library. The following is an example:
pytorch_analyse.sh -i third_party_input_path -o third_party_output_path -v 2.1.0 -m third_party # third_party_input_path indicates the path of the third-party library folder, third_party_output_path indicates the result output path, and 2.1.0 indicates the framework version of the script to be analyzed.
- Import the CSV file obtained in the preceding step to -api to obtain the third-party library APIs that do not support migration in the current training script. The following is an example:
pytorch_analyse.sh -i input_path -o output_path -v 2.1.0 -api third_party_output_path/framework_unsupported_op.csv # input_path indicates the path of the model script folder, output_path indicates the result output path, and third_party_output_path/framework_unsupported_op.csv indicates the result file of analysis on APIs not supported by the third-party library in step 1.
- Use the third_party (third-party library suite analysis) analysis function in -m to obtain the list of APIs (CSV file) that cannot be migrated in the third-party library. The following is an example:
/home/xxx/mmcv_analysis/full_unsupported_results.csv /home/xxx/transformers_analysis/full_unsupported_results.csv
Use spaces to separate multiple file paths.
-h
--help
Displays help information.
-
- After the analysis is complete, go to the script analysis result output path and view the analysis report. For details, see Analysis Report Overview.
Analysis Report Overview
- When the analysis mode is torch_apis, the analysis result is as follows:
1 2 3 4 5 6 7
├── xxxx_analysis // Output directory of the analysis result. │ ├── cuda_op_list.csv // List of CUDA APIs. │ ├── unknown_api.csv // List of APIs whose support statuses are not clear. │ ├── unsupported_api.csv // List of unsupported APIs. │ ├── api_precision_advice.csv // Expert suggestions on API accuracy tuning. │ ├── api_performance_advice.csv // Expert suggestions on API performance tuning. │ ├── pytorch_analysis.txt // Analysis process log.
Table 3 CSV files in torch_apis mode File Name
Introduction
unsupported_api.csv
For APIs that are not supported by the current framework, you can seek help from the Ascend open-source community.
Figure 1 Example of an unsupported API list
cuda_op_list.csv
Information about CUDA APIs contained in the current training script.
unknown_api.csv
List of APIs with unclear support statuses. For details about PyTorch APIs, see Table 4.
If the training fails, you can seek help from the Ascend open-source community.
api_precision_advice.csv
Expert suggestions on accuracy tuning in the current training script. In addition, you can refer to the Accuracy Debugging Tool Guide to improve the accuracy.
api_performance_advice.csv
Expert suggestions and guidance for performance tuning in the current training script. You can also refer to the Profiling Instructions to tune the performance.
NOTE:The analysis result is based on the API information of the native PyTorch framework. For details, see Table 4.
- When the analysis mode is third_party, the analysis result is as follows:
1 2 3 4 5 6 7
├── xxxx_analysis // Output directory of the analysis result. │ ├── cuda_op.csv // List of CUDA APIs. │ ├── framework_unsupported_op.csv // List of APIs unsupported by the framework. │ ├── full_unsupported_results.csv // List of all unsupported APIs. │ ├── migration_needed_op.csv // List of APIs to be migrated. │ ├── unknown_op.csv // List of APIs whose support statuses are not clear. │ ├── pytorch_analysis.txt // Analysis process log.
- When the analysis mode is affinity_apis, the analysis result is as follows:
1 2 3
├── xxxx_analysis // Output directory of the analysis result. │ ├── affinity_api_call.csv // List of native APIs that can be replaced by affinity APIs. │ ├── pytorch_analysis.txt // Analysis process log.
The analysis report affinity_api_call.csv contains the call information of native APIs and classifies the APIs into the following types: class, function, Torch (PyTorch framework APIs), and special. You can manually replace the native APIs with specified affinity APIs in the training script based on the analysis report, and run the script after replacement on the to achieve better performance. The following is an example of the analysis report.Figure 3 Example of an affinity API analysis report
- When the analysis mode is dynamic_shape, the analysis result is as follows:
1 2 3 4 5
├── xxxx_analysis // Output directory of the analysis result. │ ├── generated_script_file // The directory structure is the same as that of the script file before analysis. │ ├── msft_dynamic_analysis │ ├── hook.py // Includes function parameters for dynamic shape analysis. │ ├── __init__.py
After the dynamic shape analysis result file is generated, modify the for loop for reading the training dataset in the training script file in the analysis result output directory and manually enable dynamic shape detection. For details, see the following example.
Before the modification:for i, (ings, targets, paths, _) in pbar:
Modify the following information in bold:
for i, (ings, targets, paths, _) in DETECTOR.start(pbar):
If you run the analyzed and modified training script, an analysis report (msft_dynamic_shape_analysis_report.csv) that saves the dynamic shapes is generated in the root directory where the analysis result file is located.
- It is recommended that the model training script file obtained through dynamic shape analysis be executed on GPUs. If the model training script file has been migrated and needs to be run on NPUs, the running time of operators with dynamic shapes will be long.
- If the generated msft_dynamic_shape_analysis_report.csv file is empty, the training script does not use dynamic shapes.
