Migration Analysis

The PyTorch Analyse tool provides analysis scripts to help users analyze the APIs, third-party library suites, affinity APIs, and dynamic shapes in PyTorch training scripts on GPUs before migration. For details, see Table 1.

Table 1 Analysis mode description

Analysis Mode

Analysis Script

Analysis Result

Optimization Suggestion

Third-party library suite analysis mode

You need to provide the source code of a third-party library suite.

Quickly obtain the third-party library APIs and CUDA that are not supported in the source code.

NOTE:

Third-party library APIs refer to functions in the third-party library code. If a function uses an unsupported Torch operator or custom CUDA operator in its body, then this function represents an unsupported API in the third-party library. If other functions in the third-party library call these unsupported APIs, then these functions also represent unsupported APIs.

-

API support analysis mode

You need to provide the PyTorch training script to be analyzed.

Quickly obtain information about Torch APIs and CUDA APIs that are not supported in the training script.

Output expert suggestions on API precision and performance tuning in the training script.

Dynamic shape analysis mode

You can quickly obtain the dynamic shape information contained in the training script.

-

Affinity API analysis mode

Quickly obtain information about affinity APIs that can be replaced in the training script.

-

Prerequisites

Before using PyTorch Analyse, install the following dependencies: If you run the following commands as a non-root user, add --user at the end of each installation command, for example, pip3 install pandas --user. The installation command can be run in any path.
pip3 install pandas         # The pandas version must be 1.2.4 or later.
pip3 install libcst         # Python syntax tree parser, which is used to parse Python files.
pip3 install prettytable    # This dependency is used to visualize data in charts.
pip3 install jedi           # Mandatory for third-party library suite and affinity API analysis.

Starting an Analysis Task

  1. Go to the path where the analysis tool is located.
    cd Ascend-cann-toolkit_installation_path/ascend-toolkit/latest/tools/ms_fmk_transplt/
  2. Start an analysis task.
    Run the following command to start an analysis task based on the configuration options in Table 2:
    ./pytorch_analyse.sh -i path_of_the_script_to_be_analyzed -o analysis_result_output_path -v framework_version_of_the_script_to_be_analyzed [-m analysis_mode]

    [] encloses optional parameters, which can be omitted in actual use. If the analysis mode specified by -m/--mode is dynamic_shape, you need to modify the training script according to the description in Training Configuration after the analysis task is complete to obtain the dynamic shape analysis report.

    Table 2 Options

    Option

    Description

    Example Value

    -i

    --input

    • Path to the folder where the script file to be analyzed is located or the folder where the source code of the third-party library suite is located.
    • Mandatory.

    /home/xxx/analysis

    -o

    --output

    • Output path of the analysis result file.
    • The xxxx_analysis folder is generated in this path.
    • Mandatory.
      NOTE:

      Ensure that the output path of the analysis result file exists before running the tool. Otherwise, the tool displays an error message.

    /home/xxx/analysis_output

    -v

    --version

    • PyTorch version of the script file to be analyzed or the source code of the third-party library suite.
    • Mandatory.
    • 1.11.0
    • 2.1.0
    • 2.2.0

    -m

    --mode

    • Analysis mode. Currently, the torch_apis (API support analysis), third_party (third-party library suite analysis), affinity_apis (affinity API analysis), and dynamic_shape (dynamic shape analysis) modes are supported.
    • Optional.
    • torch_apis (default)
    • third_party
    • affinity_apis
    • dynamic_shape

    -env

    --env-path

    • Path of the PYTHONPATH environment variable added during analysis. This option takes effect only after jedi is installed.
    • Path of the third-party library to be analyzed. The third-party library APIs that are not supported by the current script are analyzed.
    • Optional.

    /home/xxx/transformers/src /home/xxx/transformers/utils

    Use spaces to separate multiple file paths.

    -api

    --api-files

    • Result file of analysis on APIs not supported by the third-party library.
    • Optional.
      NOTE:

      If the third-party library contains unsupported APIs and the custom function calls an unsupported torch API, you can use the torch API analysis function.

      1. Use the third_party (third-party library suite analysis) analysis function in -m to obtain the list of APIs (CSV file) that cannot be migrated in the third-party library. The following is an example:
        pytorch_analyse.sh -i Third-party library folder path -o Result output path -v 2.1.0 -m third_party
      2. Import the CSV file obtained in the preceding step to -api to obtain the third-party library APIs that do not support migration in the current training script. The following is an example:
        pytorch_analyse.sh -i Script folder path -o Result output path -v 2.1.0 -api Analysis result file of the unsupported third-party library APIs obtained in step 1

    /home/xxx/mmcv_analysis/full_unsupported_results.csv /home/xxx/transformers_analysis/full_unsupported_results.csv

    Use spaces to separate multiple file paths.

    -h

    --help

    Help information.

    -

  3. After the analysis is complete, go to the script analysis result output path and view the analysis report. For details, see Analysis Report Overview.

Analysis Report Overview

  • When the analysis mode is torch_apis, the analysis result is as follows:
    1
    2
    3
    4
    5
    6
    7
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── cuda_op_list.csv             // List of CUDA APIs.
       ├── unknown_api.csv              // List of APIs that are not supported. For details, see the API information of versions 2.2.0, 2.1.0, and 1.11.0 in the   Ascend Extension for PyTorch API Reference .
       ├── unsupported_api.csv          // List of unsupported APIs.
       ├── api_precision_advice.csv    // Expert suggestions on API accuracy tuning.
       ├── api_performance_advice.csv  // Expert suggestions on API performance tuning.
       ├── pytorch_analysis.txt         // Analysis process log.
    
    Table 3 CSV files in torch_apis mode

    File Name

    Introduction

    unsupported_api.csv

    For APIs that are not supported by the current framework, you can seek help from the Ascend open source community.

    Figure 1 Example of an unsupported API list

    cuda_op_list.csv

    Information about CUDA APIs contained in the current training script.

    unknown_api.csv

    List of APIs whose support statuses are not clear. If the training fails, you can seek help from the Ascend open source community.

    api_precision_advice.csv

    Expert suggestions on accuracy tuning in the current training script. In addition, you can refer to the Accuracy Debugging Tool Guide to improve the accuracy.

    api_performance_advice.csv

    Expert suggestions and guidance for performance tuning in the current training script. You can also refer to the Performance Tuning Tool User Guide to tune the performance.

    NOTE:

    The analysis result is based on the API information of the native PyTorch framework 2.2.0/2.1.0/1.11.0. For details, see Ascend Extension for PyTorch API Reference.

  • When the analysis mode is third_party, the analysis result is as follows:
    1
    2
    3
    4
    5
    6
    7
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── cuda_op.csv                  // List of CUDA APIs.
       ├── framework_unsupported_op.csv // List of APIs unsupported by the framework.
       ├── full_unsupported_results.csv // List of all unsupported APIs.
       ├── migration_needed_op.csv      // List of APIs to be migrated.
       ├── unknown_op.csv              // List of APIs whose support statuses are not clear.
       ├── pytorch_analysis.txt         // Analysis process log.
    
    Table 4 CSV files in third_party mode

    File Name

    Introduction

    framework_unsupported_op.csv

    List of APIs that are not supported by the framework. You can view the third-party library APIs that are not supported by the current framework in the third-party library source code. For APIs that are not supported by the current framework, you can seek help from the Ascend open source community.

    Figure 2 List of APIs not supported by the framework

    cuda_op.csv

    Information about CUDA APIs contained in the source code of the current third-party library.

    full_unsupported_results.csv

    List of APIs that are not supported by third-party libraries because the CUDA and PyTorch framework are not supported. You can use -api to specify this list when conducting analysis on another training script that calls source code of the analyzed third-party library to quickly obtain the analysis result.

    migration_needed_op.csv

    List of APIs to be migrated. APIs in this list can be migrated using the migration tools.

    unknown_op.csv

    List of APIs whose support statuses are not clear. If the training fails, you can seek help from the Ascend open source community.

  • When the analysis mode is affinity_apis, the analysis result is as follows:
    1
    2
    3
    ├── xxxx_analysis // Output directory of the analysis result.
       ├──  affinity_api_call.csv      // List of native APIs that can be replaced by affinity APIs
       ├──  pytorch_analysis.txt       // Analysis process log.
    
    The analysis report affinity_api_call.csv contains the call information of native APIs and classifies the APIs into the following types: class, function, Torch (PyTorch framework APIs), and special. Based on the analysis report, you can manually replace the native API with the specified affinity API in the training script. The new script has better performance when running on Ascend AI Processor. The following is an example of the analysis report.
    Figure 3 Example of an affinity API analysis report
  • When the analysis mode is dynamic_shape, the analysis result is as follows:
    1
    2
    3
    4
    5
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── generated_script_file    // The directory structure is the same as that of the script file before analysis.
       ├── msft_dynamic_analysis
             ├── hook.py         // Includes function parameters for dynamic shape analysis.
             ├── __init__.py
    

    After the dynamic shape analysis result file is generated, modify the for loop for reading the training dataset in the training script file in the analysis result output directory and manually enable dynamic shape detection. For details, see the following example.

    Before the modification:
    1
    for i, (ings, targets, paths, _) in pbar:
    

    Modify the following information in bold:

    for i, (ings, targets, paths, _) in DETECTOR.start(pbar):

    If you run the analyzed and modified training script, an analysis report (msft_dynamic_shape_analysis_report.csv) that saves the dynamic shapes is generated in the root directory where the analysis result file is located.

    • It is recommended that the model training script file obtained through dynamic shape analysis be executed on the GPU. If the model training script file has been migrated and needs to be run on the NPU, the running time of operators with dynamic shapes will be long.
    • If the generated msft_dynamic_shape_analysis_report.csv file is empty, the training script does not use dynamic shapes.