Training Configuration

This section describes the precautions that need to be paid attention to during model porting and training in special scenarios.

  • To improve the model running speed, you are advised to enable and use the binary operator. Install the binary kernels operator package by referring to "Installing CANN", and then enable the binary operator as follows:
    • In single-device scenarios, modify the training entry file, for example, main.py, and add the following code in bold under import torch_npu.
      import torch
      import torch_npu
      torch_npu.npu.set_compile_mode(jit_compile=False)
      ......
    • In the multi-device scenario, if the multi-device training startup mode is mp.spawn, torch_npu.npu.set_compile_mode(jit_compile=False) must be added to the main function for process startup to enable the binary operators. Otherwise, the enabling mode is the same as that in the single-device scenario.
      1
      2
      3
      4
      5
      6
      7
      8
      if is_distributed:
          mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
      else:
          main_worker(args.gpu, ngpus_per_node, args)
      def main_worker(gpu, ngpus_per_node, args):
          # Add to the main function for process startup.
         torch_npu.npu.set_compile_mode(jit_compile=False)
          ......
      
  • If the user training script contains the torch.nn.DataParallel API that is not supported by the Ascend NPU platform, manually change it to torch.nn.parallel.DistributedDataParallel for multi-device training. For details, see Migrating Single-Device Scripts from GPUs to Multi-Device Scripts on NPUs.
  • If the user training script contains the amp_C module that is not supported by the Ascend NPU platform, manually delete the code related to import amp_C before training.
  • If the user training script contains the torch.cuda.get_device_capability API, None will be returned when the script runs on the Ascend NPU platform after being migrated.

    When the torch.cuda.get_device_capability API is called on the GPU platform, the GPU computing power value of the Tuple[int, int] data type is returned. However, the torch.npu.get_device_capability API of the NPU platform does not involve the concept and returns None. If an error is reported, manually change None to a fixed value of the Tuple[int, int] type.

  • When the torch.cuda.get_device_properties API runs on the Ascend NPU platform after being migrated, the return value does not contain the minor and major attributes. You are advised to comment out the code that invokes these two attributes.