模型迁移与训练

以基于ImageNet数据集的ResNet50训练脚本为例，通过自动迁移的方式将其迁移到昇腾平台。

自动迁移为在训练脚本中导入脚本转换库，然后拉起脚本执行训练，训练脚本在运行的同时会自动将脚本中的接口替换为昇腾AI处理器支持的NPU接口，整体过程为边训练边转换。

获取PyTorch框架的ResNet50网络训练脚本以及数据集。

获取基于ImageNet数据集的训练模型脚本main.py，由于当前昇腾适配的PyTorch版本没有torch.backends.mps这个模块，所以需要将原代码中所有mps模块相关代码注释后再参见后续步骤进行迁移。

搜索“torch.backends.mps”，将相关代码注释掉即可。

修改前：

    if not torch.cuda.is_available() and not torch.backends.mps.is_available():
        print('using CPU, this will be slow')
    elif args.distributed:
    ...
    ...
    elif args.gpu is not None and torch.cuda.is_available():
        torch.cuda.set_device(args.gpu)
        model = model.cuda(args.gpu)
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        model = model.to(device)
    else:
    ...
    ...
    if torch.cuda.is_available():
        if args.gpu:
            device = torch.device('cuda:{}'.format(args.gpu))
        else:
            device = torch.device("cuda")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
    else:
        device = torch.device("cpu")
    ...
    ...
    def run_validate(loader, base_progress=0):
    ...
                if args.gpu is not None and torch.cuda.is_available():
                    images = images.cuda(args.gpu, non_blocking=True)
                if torch.backends.mps.is_available():
                    images = images.to('mps')
                    target = target.to('mps')
                if torch.cuda.is_available():
                    target = target.cuda(args.gpu, non_blocking=True)
    ...
    ...
    def all_reduce(self):
        if torch.cuda.is_available():
            device = torch.device("cuda")
        elif torch.backends.mps.is_available():
            device = torch.device("mps")
        else:
            device = torch.device("cpu")

修改后：

    if not torch.cuda.is_available():
        print('using CPU, this will be slow')
    elif args.distributed:
    ...
    ...
    elif args.gpu is not None and torch.cuda.is_available():
        torch.cuda.set_device(args.gpu)
        model = model.cuda(args.gpu)
 #   elif torch.backends.mps.is_available():
 #       device = torch.device("mps")
 #       model = model.to(device)
    else:
    ...
    ...
    if torch.cuda.is_available():
        if args.gpu:
            device = torch.device('cuda:{}'.format(args.gpu))
        else:
            device = torch.device("cuda")
#    elif torch.backends.mps.is_available():
#        device = torch.device("mps")
    else:
        device = torch.device("cpu")
    ...
    ...
    def run_validate(loader, base_progress=0):
    ...
                if args.gpu is not None and torch.cuda.is_available():
                    images = images.cuda(args.gpu, non_blocking=True)
#                if torch.backends.mps.is_available():
#                    images = images.to('mps')
#                    target = target.to('mps')
                if torch.cuda.is_available():
                    target = target.cuda(args.gpu, non_blocking=True)
    ...
    ...
    def all_reduce(self):
        if torch.cuda.is_available():
            device = torch.device("cuda")
#        elif torch.backends.mps.is_available():
#            device = torch.device("mps")
        else:
            device = torch.device("cpu")

将定制好的main.py文件上传至服务器，例如上传到“/home/sample”目录。

自行收集ImageNet数据集，并上传至服务器任意目录，例如“/home/sample/data/resnet50/imagenet”。
可从ImageNet官方网站https://www.image-net.org/获取数据集。

配置环境变量。

export PYTHONPATH={CANN包安装目录}/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:$PYTHONPATH

在训练脚本中导入库代码。
```
import torch 
import torch_npu 
..... 
from torch_npu.contrib import transfer_to_npu
```
- 仅支持PyTorch 1.11.0版本及以上使用。
- 仅PyTorch框架下需要导入from torch_npu.contrib import transfer_to_npu代码。
执行命令进行单卡训练。
在main.py脚本所在路径下执行如下命令，拉起训练脚本，训练脚本执行的过程中会进行迁移。
```
python3 main.py /home/sample/data/resnet50/imagenet --batch-size 128 --lr 0.1 --epochs 1 --arch resnet50 --world-size 1 --rank 0 --workers 40 --momentum 0.9 --weight-decay 1e-4 --gpu 0
```
关键参数含义如下：
- --batch-size：训练批次大小，请尽量设置为处理器核数的倍数以更好的发挥性能。
- --lr：学习率。
- --epochs：训练迭代轮数。
- --arch：模型架构。
- --world-size：参与训练的节点数量。
- --rank：卡号。
- --workers：加载数据的进程数。
- --momentum：动量。
- --weight-decay：权重衰减。
- --gpu：Device ID，这里参数名称仍为gpu, 但迁移完成后实际训练设备已在代码中定义为npu。
训练结束后生成“checkpoint.pth.tar”权重文件，则说明迁移训练成功。

图1 获得权重文件

父主题： 快速入门