Migration Using PyTorch GPU2Ascend

Prerequisites

Before using PyTorch GPU2Ascend to migrate a PyTorch training script, ensure that the following dependencies have been installed: If you run the following commands as a non-root user, add --user at the end of each installation command, for example, pip3 install pandas --user. The installation command can be run in any path.

pip3 install pandas         # The pandas version must be 1.2.4 or later.
pip3 install libcst         # Python syntax tree parser, which is used to parse Python files.
pip3 install prettytable    # This dependency is used to visualize data in charts.
pip3 install jedi           # (Optional) This dependency is used for cross-file parsing. You are advised to install it.

Restrictions

The platform of the migrated script is different from that of the original script. Therefore, during the debugging and running of the migrated script, an exception may be thrown due to such causes as operator differences and the process is terminated. This type of exception needs to be further debugged and resolved based on the specific exception information.
After the analysis and migration, you can perform training according to the training process provided by the original script.

Starting a Migration Task

Go to the path where the migration tool is located.

cd Ascend-cann-toolkit_installation_path/ascend-toolkit/latest/tools/ms_fmk_transplt/

Start a migration task.

Run the following command to start a migration task based on the configuration options in Table 1:

./pytorch_gpu2npu.sh -i Original script path -o Path for saving the script migration result -v Original script framework version [-s] [distributed -m Training script entry file -t Target model variable name]

Specify distributed and its options -m and -t at the end of the statement.

Example:

# Single-device
./pytorch_gpu2npu.sh -i /home/train/ -o /home/out -v 2.1.0 [-s]
# Distributed
./pytorch_gpu2npu.sh -i /home/train/ -o /home/out -v 2.1.0 [-s] distributed -m /home/train/train.py [-t model]

[] encloses optional parameters, which can be omitted in actual use.

**Table 1** Command-line options
Option	Description	Example Value
-i --input	Path of the folder where the original script file to be migrated is located. Mandatory.	/home/username/fmktransplt
-o --output	Output path of the script migration result file. If migration to single-device scripts is disabled, that is, distributed is disabled, the output directory will be named **xxx_msft. If distributed** is enabled, the output directory is xxx_msft_multi, where xxx indicates the name of the folder where the original script is located. Mandatory.	/home/username/fmktransplt_output
-v --version	PyTorch version of the script to be migrated. Mandatory.	1.11.0 2.1.0 2.2.0
-s --specify-device	Uses the environment variable DEVICE_ID to specify a device as an advanced feature. However, the distributed function in the original script may become invalid. Optional.	-
distributed	Migrates a GPU single-device script into an NPU multi-device script. This option can be used only in Scenarios Where Data Is Loaded in torch.utils.data.DataLoader Mode. The -t/--target_model option can be specified only when this option is specified. -m, --main: (Mandatory) entry Python file of the training script. -t, --target_model: (Optional) instantiation model variable name in the script to be migrated. The default value is model. If the variable name is not model, you need to set this parameter. For example, if my_model is Model(), set this parameter to -t my_model.	-
-h --help	Help information.	-

After the script migration is complete, go to the output path of the script migration result to view the result file.
- During script migration, migration analysis is started. By default, the torch_apis and affinity_apis analysis modes are used. You can view the corresponding result files by referring to Analysis Report Overview.
- If the distributed parameter is enabled during the migration, you can obtain the result files by referring to Migrating GPU Single-device Scripts to NPU Multi-device Scripts.
Run the modified model script on the Ascend NPU platform according to the training process provided by the original script in the Training Configuration.
If the weight is saved successfully, it indicates that the weight saving migration is successful.

Parent topic: Migration Training