PyTorch Training

Environment Setup

Install the PyTorch framework and mixed precision module. For details, see "Environment Setup" in the PyTorch Network Model Porting and Training.

Prepare required training and validation image datasets and upload them to the train/ and val/ folders in the training environment, respectively. For details, see PyTorch Model > Training > Preparing a Dataset.

Environment Variable Configuration

  1. Log in as the running user, run the vi ~/.bashrc command in any directory to open the .bashrc file, and append the following content to the file (the default installation path of a non-root user is used as an example):
    # Ascend-CANN-Toolkit environment variable. Change it to the actual path.
    source ~/Ascend/ascend-toolkit/set_env.sh
    
    # PyTorch environment variable. Change it to the actual path.
    export LD_LIBRARY_PATH=~/.local/lib/python3.7/site-packages/torch/lib:$LD_LIBRARY_PATH
  2. Run the :wq! command to save the file and exit.
  3. Run the source ~/.bashrc command for the modification to take effect immediately.

Procedure

The following describes the overall procedure for creating a training project with the ResNet-50 for PyTorch template sample. For details about the project information and related pop-up windows, see Procedure.

  1. Click Ascend Training on the left of the page to create an Ascend training project, as shown in Figure 1.
    Figure 1 Project creation page
  2. On the training project selection page shown in Figure 1, select the ResNet-50 for PyTorch template under CANN Version and Samples.
  3. Click Next and configure other information about the training project. For details about the parameters, see Creating a Training Project.
  4. Click Finish. The training project is created.

    If this is your first-time creation, the tool will automatically download the sample project template. Ensure that your device is connected to the network; or subsequent operations cannot be performed.

  5. View the ResNet-50 for PyTorch template project window as shown in Figure 2.
    Figure 2 Template project window

    If error message "Unzip failed. There is problem occurred when unzipping file." is displayed when you create a sample training project on Window, refer to What Do I Do If I Get Error "Unzip failed. There is problem occurred when unzipping file." When Creating a Sample Training Project on Windows? to rectify the fault.

  6. Find the run_xx.sh file in the directory on the left of the project page, and set the paths of the training and validation image datasets obtained in Environment Setup in the data field of the file. See Figure 3.
    Figure 3 Setting the dataset path

    The PyTorch ResNet-50 template of MindStudio has preset training parameters in the code of the training script. To customize training parameters, you need to learn the PyTorch framework code.

  7. Set the run configurations and run the project.
    1. Choose Run > Edit Configurations... on the training project page or click Edit Configurations... on the menu shown in Figure 4 to access the run configuration page.
      Figure 4 Shortcut to the run configuration page
    2. Set training parameters, as shown in Figure 5.
      Figure 5 Run configuration page

      Set run configurations of the training project on the right, as described in Table 1.

      Table 1 Run configurations of the training project

      Parameter

      Description

      Example

      Name

      Project name (user-defined).

      For example: MyTraining3.

      The name contains a maximum of 64 characters, starting with a letter and ending with a letter or digit. Only letters, digits, hyphens (-), and underscores (_) are allowed.

      Run Mode

      Run mode.

      Local Run

      Deployment

      Run configurations.

      You can use the Deployment function to synchronize the files and folders in a specified project to a specified directory on a remote device. For details, see Deployment.

      In this example, Run Mode is set to Local Run. Therefore, this parameter is not displayed.

      Executable

      Entry point file of the training project.

      For example: run_1p.sh.

      -

      Command Arguments

      Command-line arguments for training. This parameter is optional.

      Set this parameter as required.

      Environment Variables

      Environment variables of the training project. This parameter is optional.

      Set this parameter as required.

    3. Click OK, and the training project information is created.
    4. Choose Run > Run 'MyTraining1' on the project page or click the button shown in Figure 6 to perform training.
      Figure 6 Performing training using a shortcut
      Figure 7 shows the training process.
      Figure 6 Training process display
    5. After the training is complete, the generated model file is stored in the /result directory of the project file.