Migration Using PyTorch GPU2Ascend
Prerequisites
Before using PyTorch GPU2Ascend to migrate a PyTorch training script, ensure that the following dependencies have been installed: If you run the following commands as a non-root user, add --user at the end of each installation command, for example, pip3 install pandas --user. The installation command can be run in any path.
1 2 3 4 | pip3 install pandas # The pandas version must be 1.2.4 or later. pip3 install libcst # Python syntax tree parser, which is used to parse Python files. pip3 install prettytable # This dependency is used to visualize data in charts. pip3 install jedi # (Optional) This dependency is used for cross-file parsing. You are advised to install it. |
Restrictions
- The platform of the migrated script is different from that of the original script. Therefore, during the debugging and running of the migrated script, an exception may occur due to such causes as operator differences and the process is terminated. This type of exception needs to be further debugged and resolved based on the specific exception information.
- After the analysis and migration, you can perform training according to the training process provided by the original script.
Starting a Migration Task
- Go to the path where the migration tool is located.
cd ${INSTALL_DIR}/cann/tools/ms_fmk_transplt/ # Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
- Start a migration task.Run the following command to start a migration task based on the configuration options in Table 1:
./pytorch_gpu2npu.sh -i /home/username/fmktransplt -o /home/username/fmktransplt_output -v 2.1.0 [-s] [distributed -m /home/train/train.py -t model] # /home/username/fmktransplt indicates the path of the original script, /home/username/fmktransplt_output indicates the output path of the script migration result, 2.1.0 indicates the original script framework version, /home/train/train.py indicates the entry file of the training script, and model indicates the variable name of the target model.
Specify distributed and its options -m and -t at the end of the statement.
Example:
# Single-device ./pytorch_gpu2npu.sh -i /home/train/ -o /home/out -v 2.1.0 [-s] # Distributed ./pytorch_gpu2npu.sh -i /home/train/ -o /home/out -v 2.1.0 [-s] distributed -m /home/train/train.py [-t model]
[] encloses optional parameters, which can be omitted in actual use.
Table 1 Parameter description Parameter
Description
Example Value
-i
--input
- Path of the folder where the original script file to be migrated is located.
- Mandatory.
/home/username/fmktransplt
-o
--output
- Output path of the script migration result file.
- If distributed is not enabled (corresponds to migration to single-device), the output directory name is xxx_msft. If distributed is enabled (corresponds to migration to multi-device), the output directory name is xxx_msft_multi. xxx indicates the name of the folder where the original script is located.
- Mandatory.
/home/username/fmktransplt_output
-v
--version
- PyTorch version of the script to be migrated.
- Mandatory.
- 1.11.0
- 2.1.0
- 2.2.0
- 2.3.1
- 2.4.0
- 2.5.1
- 2.6.0
-s
--specify-device
- Uses the environment variable DEVICE_ID to specify a device as an advanced feature. However, the distributed function in the original script may become invalid.
- Optional.
-
distributed
- Migrates single-device scripts from GPUs to multi-device scripts on NPUs. This parameter can be used only in Scenarios Where Data Is Loaded in torch.utils.data.DataLoader Mode. After this parameter is specified, -t/--target_model can be set.
- -m/--main: (Mandatory) entry Python file of the training script.
- -t/--target_model: (Optional) instantiation model variable name in the script to be migrated. The default value is model.
If the variable name is not model, you need to set this parameter. For example, if my_model is Model(), set this parameter to -t my_model.
-
-h
--help
Displays help information.
-
- After the script migration is complete, go to the output path of the script migration result to view the result files.
- During script migration, migration analysis is started. By default, the torch_apis and affinity_apis analysis modes are used. You can view the corresponding result files by referring to Analysis Report Overview.
- If the distributed parameter is enabled during the migration, you can obtain the result files by referring to Migrating Single-Device Scripts from GPUs to Multi-Device Scripts on NPUs.
- Run the modified model script on the Ascend NPU platform according to the training process provided by the original script in the Training Configuration.
- If the weight is saved successfully, it indicates that the weight saving migration is successful.
- After the training is complete, the migration tool automatically saves the weight, indicating that the migration is successful.
Parent topic: Migration Training