需要在NPU上执行如下操作。
打开编辑训练脚本,在train函数的for循环下添加指令:
with torch.utils.dumper(enabled=True, use_dump=False, use_load=True, dump_path="/home/Jason.h5", load_file_path="/home/resnet50_dump.h5") as dump:
其中load_file_path是指定获取基于GPU执行模型训练生成的数据文件而dump_path是本次执行训练的输出文件路径。
如果以上指令选择将对应训练函数包含在内,则表示提取对应训练步骤生成的数据,须根据实际情况包含对应的训练步骤脚本,示例如下:
def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, target) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) if torch.cuda.is_available(): target = target.cuda(args.gpu, non_blocking=True) with torch.utils.dumper(enabled=True, use_dump=False, use_load=True, dump_path="/home/Jason.h5", load_file_path="/home/resnet50_dump.h5") as dump: # compute output output = model(images) loss = criterion(output, target) # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) losses.update(loss.item(), images.size(0)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i)
脚本中的运行设备字段应对应设备NPU。
bash $PATH/train_full_1p.sh --data_path=/home/dataset
其中$PATH为train_full_1p.sh文件的的路径。