Migration Operation

PyTorch GPU2Ascend provides the script analysis and migration functions to help users analyze the operators and APIs supported by a PyTorch training script and migrate the script to one that can run on NPUs.

Migration Procedure

Use either of the following methods to start a script migration task:
- Choose Ascend > Migration Tools > PyTorch GPU2Ascend on the toolbar.
- Click on the toolbar.
- Right-click the training project and choose PyTorch GPU2Ascend from the shortcut menu.
Click the Transfer tab. The migration task page is displayed.

Configure parameters as required.

Figure 1 shows the Transfer page, on which you can configure parameters as required.

Figure 1 Transfer parameter configuration

**Table 1** **Transfer** parameters
Parameter	Description
PyTorch Version	(Required) PyTorch version of the script to be migrated. Currently, PyTorch 1.11.0, 2.1.0, and 2.2.0 are supported. The default version is 1.11.0.
Input Path	(Required) Directory of the script file to be migrated. Click the folder icon to select one.
Output Path	(Required) Output path of the script migration result file. Click the folder icon to select one. If migration to single-device scripts is disabled, that is, Distributed Rule is disabled, the output directory will be named **xxx_msft. If migration to multi-device scripts is enabled, that is, Distributed Rule is enabled, the output directory will be named xxx_msft_multi. xxx** indicates the name of the folder that houses the original scripts.
Distributed Rule	(Optional) Migration from a GPU single-device script to an NPU multi-device script. This parameter can be used only in scenarios where data is loaded in torch.utils.data.DataLoader mode. After this parameter is enabled, configure the following parameters: Main File (required): Click the folder icon and select the entry Python file for the training script. Target Model (optional): Instantiation model variable name in the script to be migrated. The default value is model. If the variable name is not model, you need to set this parameter. For example, if my_model is Model(), set this parameter to -t my_model.

Click Transplant to execute the migration task.

After the migration, check the result file in the Output Path directory.

├── xxx_msft/xxx_msft_multi              // Directory for storing the script migration result.
│   ├── generated script file    // The directory structure is the same as that of the script file directory before the migration.
│   ├── transplant_result_file       // File that stores the migration result.
│   │   ├── msFmkTranspltlog.txt         // Script migration log file. The maximum size of a log file is 1 MB. If the size of a log file exceeds 1 MB, it is stored in multiple files. A maximum of 10 files are supported.
│   │   ├── cuda_op_list.csv            // List of analyzed CUDA operators.
│   │   ├── unknown_api.csv             // List of APIs whose support statuses are not clear.
│   │   ├── unsupported_api.csv         // List of unsupported APIs.
│   │   ├── api_precision_advice.csv    // Expert suggestions on API accuracy tuning.
│   │   ├── api_performance_advice.csv  // Expert suggestions on API performance tuning.
│   │   ├── change_list.csv              // Change history file.
│   ├── run_distributed_npu.sh       // Multi-device boot shell script.

Follow-up Operations

To improve the model running speed, you are advised to use binary operators. After installing the binary OPP according to CANN Software Installation Guide, you can enable binary operators as follows:
- In the single-device scenario, modify the training entry point file, for example, main.py, and add the information in bold under import torch_npu.
```
import torch
import torch_npu
torch.npu.set_compile(jit_compile=False)
......
```
- In the multi-device scenario, if the multi-device training startup mode is mp.spawn, torch.npu.set_compile(jit_compile=False) must be added to the main function for process startup to enable binary operators. Otherwise, the enabling mode is the same as that in the single-device scenario.
```
if is_distributed:
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
    main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
    # Add to the main function for process startup.
    torch.npu.set_compile(jit_compile=False)
    ......
```

If Distributed Rule is enabled, the run_distributed_npu.sh file is generated after the migration. Before executing the migrated model, replace the please input your shell script here statement in the file with the original training shell script of the model. After the run_distributed_npu.sh file is executed, logs of the specified NPU are generated.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29688
export HCCL_WHITELIST_DISABLE=1

NPUS=($(seq 0 7))
export RANK_SIZE=${#NPUS[@]}
rank=0
for i in ${NPUS[@]}
do
    export DEVICE_ID=${i}
    export RANK_ID=${rank}
    echo run process ${rank}
    please input your shell script here > output_npu_${i}.log 2>&1 &
    let rank++
done

**Table 2** Parameters
Parameter	Description
MASTER_ADDR	IP address of the training server.
MASTER_PORT	Port of the training server.
HCCL_WHITELIST_DISABLE	HCCL backend environment.
NPUS	Running on a specified NPU.
RANK_SIZE	Number of devices to be invoked.
DEVICE_ID	ID of the device to be invoked.
RANK_ID	Logical ID of the device to be invoked.

If the user training script contains the amp_C module that is not supported by the Ascend NPU platform, manually delete it before training.
If the user training script contains the torch.nn.DataParallel API that is not supported by the Ascend NPU platform, manually change it to torch.nn.parallel.DistributedDataParallel for multi-device training. For details, see Model Script and Startup Script Configuration.
If the user training script contains the torch.cuda.default_generators API that is not supported by the Ascend NPU platform, manually change the interface to torch_npu.npu.default_generators.
If the user training script contains the torch.cuda.get_device_capability API, None will be returned when the script runs on the Ascend AI Processor after being migrated. If an error is reported, you need to manually change None to a fixed value. When the torch.cuda.get_device_properties API runs on the the Ascend AI Processor after being migrated, the return value does not contain the minor and major attributes. You are advised to comment out the code that invokes these two attributes.
After the analysis and migration, you can perform training by following the instructions in Model Training.

Parent topic: PyTorch GPU2Ascend