Migration Analysis

The PyTorch Analyse tool provides analysis scripts to help users analyze the APIs, third-party library suites, affinity APIs, and dynamic shapes in PyTorch training scripts on GPUs before migration. For details, see Table 1.

Table 1 Analysis mode description

Analysis Mode

Analysis Script

Analysis Result

Optimization Suggestion

Third-party library suite analysis mode

You need to provide the source code of a third-party library suite.

Quickly obtain unsupported third-party library APIs and CUDA information in the source code.

NOTE:

Third-party library APIs refer to functions in the third-party library code. If a function uses an unsupported Torch operator or custom CUDA operator in its body, then this function represents an unsupported API in the third-party library. If other functions in the third-party library call these unsupported APIs, then these functions also represent unsupported APIs.

-

API support analysis mode

You need to provide the PyTorch training script to be analyzed.

Quickly obtain information about Torch APIs and CUDA APIs that are not supported in the training script.

Output expert suggestions on API precision and performance tuning in the training script.

Dynamic shape analysis mode

Quickly obtain the dynamic shape information contained in the training script.

-

Affinity API analysis mode

Quickly obtain information about affinity APIs that can be replaced in the training script.

-

Prerequisites

Before using PyTorch Analyse, install the following dependencies: If you run the following commands as a non-root user, add --user at the end of each installation command, for example, pip3 install pandas --user. The installation command can be run in any path.
pip3 install pandas         # The pandas version must be 1.2.4 or later.
pip3 install libcst         # Python syntax tree parser, which is used to parse Python files.
pip3 install prettytable    # This dependency is used to visualize data in charts.
pip3 install jedi           # Mandatory for third-party library suite and affinity API analysis.

Starting an Analysis Task

  1. Go to the path where the analysis tool is located.
    cd ${INSTALL_DIR}/cann/tools/ms_fmk_transplt/     #
    Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
  2. Start an analysis task.
    Run the following command to start an analysis task based on the configuration options in Table 2:
    ./pytorch_analyse.sh -i /home/xxx/analysis -o /home/xxx/analysis_output -v 2.1.0 [-m torch_apis]    # /home/xxx/analysis indicates the path of the script to be analyzed, /home/xxx/analysis_output indicates the analysis result output path, and 2.1.0 indicates the framework version of the script to be analyzed.
    torch_apis indicates the analysis mode.

    [] encloses optional parameters, which can be omitted in actual use. If the analysis mode specified by -m/--mode is dynamic_shape, you need to modify the training script according to the description in Training Configuration after the analysis task is complete to obtain the dynamic shape analysis report.

    Table 2 Parameter description

    Parameter

    Description

    Example Value

    -i

    --input

    • Path to the folder where the script file to be analyzed is located or the folder where the source code of the third-party library suite is located.
    • Mandatory.

    /home/xxx/analysis

    -o

    --output

    • Output path of the analysis result file.
    • The xxxx_analysis folder is generated in this path.
    • Mandatory.
      NOTE:

      Ensure that the output path of the analysis result file exists before running the tool. Otherwise, the tool displays an error message.

    /home/xxx/analysis_output

    -v

    --version

    • PyTorch version of the script file to be analyzed or the source code of the third-party library suite.
    • Mandatory.
    • 1.11
    • 2.1.0
    • 2.2.0
    • 2.3.1
    • 2.4.0
    • 2.5.1
    • 2.6.0
      NOTE:

      In automatic migration mode, PyTorch 1.11.0 does not support Atlas A3 training products / Atlas A3 inference products .

    -m

    --mode

    • Analysis mode. Currently, the torch_apis (API support analysis), third_party (third-party library suite analysis), affinity_apis (affinity API analysis), and dynamic_shape (dynamic shape analysis) modes are supported.
    • Optional.
    • torch_apis (default)
    • third_party
    • affinity_apis
    • dynamic_shape

    -env

    --env-path

    • Path of the PYTHONPATH environment variable added during analysis. This option takes effect only after jedi is installed.
    • Path of the third-party library to be analyzed. The third-party library APIs that are not supported by the current script are analyzed.
    • Optional.

    /home/xxx/transformers/src /home/xxx/transformers/utils

    Use spaces to separate multiple file paths.

    -api

    --api-files

    • Result file of analysis on APIs not supported by the third-party library.
    • Optional.
      NOTE:

      If the third-party library contains unsupported APIs and the custom function calls an unsupported torch API, you can use the torch API analysis function.

      1. Use the third_party (third-party library suite analysis) analysis function in -m to obtain the list of APIs (CSV file) that cannot be migrated in the third-party library. The following is an example:
        pytorch_analyse.sh -i third_party_input_path -o third_party_output_path -v 2.1.0 -m third_party # third_party_input_path indicates the path of the third-party library folder, third_party_output_path indicates the result output path, and 2.1.0 indicates the framework version of the script to be analyzed.
      2. Import the CSV file obtained in the preceding step to -api to obtain the third-party library APIs that do not support migration in the current training script. The following is an example:
        pytorch_analyse.sh -i input_path -o output_path -v 2.1.0 -api third_party_output_path/framework_unsupported_op.csv   # input_path indicates the path of the model script folder, output_path indicates the result output path, and third_party_output_path/framework_unsupported_op.csv indicates the result file of analysis on APIs not supported by the third-party library in step 1.

    /home/xxx/mmcv_analysis/full_unsupported_results.csv /home/xxx/transformers_analysis/full_unsupported_results.csv

    Use spaces to separate multiple file paths.

    -h

    --help

    Displays help information.

    -

  3. After the analysis is complete, go to the script analysis result output path and view the analysis report. For details, see Analysis Report Overview.

Analysis Report Overview

  • When the analysis mode is torch_apis, the analysis result is as follows:
    1
    2
    3
    4
    5
    6
    7
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── cuda_op_list.csv             // List of CUDA APIs.
       ├── unknown_api.csv              // List of APIs whose support statuses are not clear.
       ├── unsupported_api.csv          // List of unsupported APIs.
       ├── api_precision_advice.csv    // Expert suggestions on API accuracy tuning.
       ├── api_performance_advice.csv  // Expert suggestions on API performance tuning.
       ├── pytorch_analysis.txt         // Analysis process log.
    
    Table 3 CSV files in torch_apis mode

    File Name

    Introduction

    unsupported_api.csv

    For APIs that are not supported by the current framework, you can seek help from the Ascend open-source community.

    Figure 1 Example of an unsupported API list

    cuda_op_list.csv

    Information about CUDA APIs contained in the current training script.

    unknown_api.csv

    List of APIs with unclear support statuses. For details about PyTorch APIs, see Table 4.

    If the training fails, you can seek help from the Ascend open-source community.

    api_precision_advice.csv

    Expert suggestions on accuracy tuning in the current training script. In addition, you can refer to the Accuracy Debugging Tool Guide to improve the accuracy.

    api_performance_advice.csv

    Expert suggestions and guidance for performance tuning in the current training script. You can also refer to the Profiling Instructions to tune the performance.

    NOTE:

    The analysis result is based on the API information of the native PyTorch framework. For details, see Table 4.

    Table 4 PyTorch API information

    PyTorch Framework Version

    API Information Reference Link

    Ascend Extension for PyTorch Version

    CANN Version

    2.8.0

    PyTorch 2.8.0

    7.2.0

    Commercial edition: 8.3.RC1

    Community edition: 8.3.RC1

    2.7.1

    PyTorch 2.7.1

    2.6.0

    PyTorch 2.6.0

    7.1.0

    Commercial edition: 8.2.RC1

    Community edition: 8.2.RC1

    2.5.1

    PyTorch 2.5.1

    2.3.1

    PyTorch 2.3.1

    2.5.1

    PyTorch 2.5.1

    7.0.0

    Commercial edition: 8.1.RC1

    Community edition: 8.1.RC1

    2.4.0

    PyTorch 2.4.0

    2.3.1

    PyTorch 2.3.1

    2.1.0

    PyTorch 2.1.0

    2.1.0

    PyTorch 2.1.0

    6.0.0

    Commercial edition: 8.0.0

    Community edition: 8.0.0.beta1

    2.3.1

    PyTorch 2.3.1

    2.4.0

    PyTorch 2.4.0

    2.1.0

    PyTorch 2.1.0

    6.0.RC3

    Commercial edition: 8.0.RC3

    Community edition: 8.0.RC3.beta1

    2.3.1

    PyTorch 2.3.1

    2.4.0

    PyTorch 2.4.0

    1.11.0

    PyTorch 1.11.0

    6.0.RC2

    Commercial edition: 8.0.RC2

    Community edition: 8.0.RC2.beta1

    2.1.0

    PyTorch 2.1.0

    2.2.0

    PyTorch 2.2.0

    2.3.1

    PyTorch 2.3.1

    1.11.0

    PyTorch 1.11.0

    6.0.RC1

    Commercial edition: 8.0.RC1

    Community edition: 8.0.RC1.beta1

    2.1.0

    PyTorch 2.1.0

    2.2.0

    PyTorch 2.2.0

  • When the analysis mode is third_party, the analysis result is as follows:
    1
    2
    3
    4
    5
    6
    7
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── cuda_op.csv                  // List of CUDA APIs.
       ├── framework_unsupported_op.csv // List of APIs unsupported by the framework.
       ├── full_unsupported_results.csv // List of all unsupported APIs.
       ├── migration_needed_op.csv      // List of APIs to be migrated.
       ├── unknown_op.csv              // List of APIs whose support statuses are not clear.
       ├── pytorch_analysis.txt         // Analysis process log.
    
    Table 5 CSV files in third_party mode

    File Name

    Introduction

    framework_unsupported_op.csv

    List of APIs that are not supported by the framework. You can view the third-party library APIs that are not supported by the current framework in the third-party library source code. For APIs that are not supported by the current framework, you can seek help from the Ascend open-source community.

    Figure 2 List of APIs not supported by the framework

    cuda_op.csv

    Information about CUDA APIs contained in the source code of the current third-party library.

    full_unsupported_results.csv

    List of all APIs that are not supported by third-party libraries because the CUDA and PyTorch framework are not supported. You can use -api to specify this list when conducting analysis on another training script that calls source code of the analyzed third-party library to quickly obtain the analysis result.

    migration_needed_op.csv

    List of APIs to be migrated. APIs in this list can be migrated using the migration tools.

    unknown_op.csv

    List of APIs whose support statuses are not clear. If the training fails, you can seek help from the Ascend open-source community.

  • When the analysis mode is affinity_apis, the analysis result is as follows:
    1
    2
    3
    ├── xxxx_analysis // Output directory of the analysis result.
       ├──  affinity_api_call.csv      // List of native APIs that can be replaced by affinity APIs.
       ├──  pytorch_analysis.txt       // Analysis process log.
    
    The analysis report affinity_api_call.csv contains the call information of native APIs and classifies the APIs into the following types: class, function, Torch (PyTorch framework APIs), and special. You can manually replace the native APIs with specified affinity APIs in the training script based on the analysis report, and run the script after replacement on the to achieve better performance. The following is an example of the analysis report.
    Figure 3 Example of an affinity API analysis report
  • When the analysis mode is dynamic_shape, the analysis result is as follows:
    1
    2
    3
    4
    5
    ├── xxxx_analysis     // Output directory of the analysis result.
       ├── generated_script_file    // The directory structure is the same as that of the script file before analysis.
       ├── msft_dynamic_analysis
             ├── hook.py         // Includes function parameters for dynamic shape analysis.
             ├── __init__.py
    

    After the dynamic shape analysis result file is generated, modify the for loop for reading the training dataset in the training script file in the analysis result output directory and manually enable dynamic shape detection. For details, see the following example.

    Before the modification:
    for i, (ings, targets, paths, _) in pbar:

    Modify the following information in bold:

    for i, (ings, targets, paths, _) in DETECTOR.start(pbar):

    If you run the analyzed and modified training script, an analysis report (msft_dynamic_shape_analysis_report.csv) that saves the dynamic shapes is generated in the root directory where the analysis result file is located.

    • It is recommended that the model training script file obtained through dynamic shape analysis be executed on GPUs. If the model training script file has been migrated and needs to be run on NPUs, the running time of operators with dynamic shapes will be long.
    • If the generated msft_dynamic_shape_analysis_report.csv file is empty, the training script does not use dynamic shapes.