Model Porting and Training

Take the ResNet-50 training script based on the ImageNet dataset as an example to describe how to port the script to the Ascend platform through automated porting.

Automated porting is the process of importing a script conversion library into a training script and then executing the script to perform training. During the execution of the training script, the APIs in the script are automatically replaced by the NPU APIs supported by Ascend AI Processors. The process is conversion while training.

Obtain the ResNet-50 network training script in the PyTorch framework and the corresponding dataset.

Obtain the ImageNet-based model training script main.py. Because the current Ascend-adapted PyTorch version does not have the torch.backends.mps module, you need to comment out all code related to that module in the source code before porting the script as follows:

Search for torch.backends.mps and comment out related code.

Before the modification:

    if not torch.cuda.is_available() and not torch.backends.mps.is_available():
        print('using CPU, this will be slow')
    elif args.distributed:
    ...
    ...
    elif args.gpu is not None and torch.cuda.is_available():
        torch.cuda.set_device(args.gpu)
        model = model.cuda(args.gpu)
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        model = model.to(device)
    else:
    ...
    ...
    if torch.cuda.is_available():
        if args.gpu:
            device = torch.device('cuda:{}'.format(args.gpu))
        else:
            device = torch.device("cuda")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
    else:
        device = torch.device("cpu")
    ...
    ...
    def run_validate(loader, base_progress=0):
    ...
                if args.gpu is not None and torch.cuda.is_available():
                    images = images.cuda(args.gpu, non_blocking=True)
                if torch.backends.mps.is_available():
                    images = images.to('mps')
                    target = target.to('mps')
                if torch.cuda.is_available():
                    target = target.cuda(args.gpu, non_blocking=True)
    ...
    ...
    def all_reduce(self):
        if torch.cuda.is_available():
            device = torch.device("cuda")
        elif torch.backends.mps.is_available():
            device = torch.device("mps")
        else:
            device = torch.device("cpu")

After the modification:

         
          
            
            
                  if not torch.cuda.is_available():
        print('using CPU, this will be slow')
    elif args.distributed:
    ...
    ...
    elif args.gpu is not None and torch.cuda.is_available():
        torch.cuda.set_device(args.gpu)
        model = model.cuda(args.gpu)
 #   elif torch.backends.mps.is_available():
 #       device = torch.device("mps")
 #       model = model.to(device)
    else:
    ...
    ...
    if torch.cuda.is_available():
        if args.gpu:
            device = torch.device('cuda:{}'.format(args.gpu))
        else:
            device = torch.device("cuda")
#    elif torch.backends.mps.is_available():
#        device = torch.device("mps")
    else:
        device = torch.device("cpu")
    ...
    ...
    def run_validate(loader, base_progress=0):
    ...
                if args.gpu is not None and torch.cuda.is_available():
                    images = images.cuda(args.gpu, non_blocking=True)
#                if torch.backends.mps.is_available():
#                    images = images.to('mps')
#                    target = target.to('mps')
                if torch.cuda.is_available():
                    target = target.cuda(args.gpu, non_blocking=True)
    ...
    ...
    def all_reduce(self):
        if torch.cuda.is_available():
            device = torch.device("cuda")
#        elif torch.backends.mps.is_available():
#            device = torch.device("mps")
        else:
            device = torch.device("cpu")

             

           

         
        

Upload the customized main.py file to a directory on the server, for example, /home/sample.

Collect the ImageNet dataset and upload it to any directory on the server, for example, /home/sample/data/resnet50/imagenet.
You can obtain the dataset from https://www.image-net.org/.

Configure environment variables. The default installation path of the root user is used as an example.

       
            source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PYTHONPATH=/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:$PYTHONPATH

Import the library code to your training script.

       
            import torch 
import torch_npu 
..... 
from torch_npu.contrib import transfer_to_npu

Only PyTorch 1.11.0 and later versions are supported.
The from torch_npu.contrib import transfer_to_npu code needs to be imported only in the PyTorch framework.

Perform single-device training.

Run the following command in the path where the main.py script is located to start the training script. Porting is performed during the execution of the training script.

       
            python3 main.py /home/sample/data/resnet50/imagenet --batch-size 128 --lr 0.1 --epochs 1 --arch resnet50 --world-size 1 --rank 0 --workers 40 --momentum 0.9 --weight-decay 1e-4 --gpu 0

The key parameters are described as follows:

--batch-size: training batch size. Set this parameter to a multiple of the processor core quantity to improve performance.
--lr: learning rate.
--epochs: number of training epochs.
--arch: model architecture.
--world-size: number of nodes involved in training.
--rank: card number.
--workers: number of processes for loading data.
--momentum: momentum.
--weight-decay: weight decay.
--gpu: device ID. The parameter name is still gpu here, but the actual training device has been defined as npu in the code after porting.

If the checkpoint.pth.tar weight file is generated after the training is complete, the porting and training are successful.

Figure 1 Obtaining weight file

Parent topic: Quick Start