PyTorch Training
Environment Setup
Install the PyTorch framework and mixed precision module. For details, see "Environment Setup" in the PyTorch Network Model Porting and Training.
Prepare required training and validation image datasets and upload them to the train/ and val/ folders in the training environment, respectively. For details, see PyTorch Model > Training > Preparing a Dataset.
Environment Variable Configuration
- Log in as the running user, run the vi ~/.bashrc command in any directory to open the .bashrc file, and append the following content to the file (the default installation path of a non-root user is used as an example):
# Ascend-CANN-Toolkit environment variable. Change it to the actual path. source ~/Ascend/ascend-toolkit/set_env.sh # PyTorch environment variable. Change it to the actual path. export LD_LIBRARY_PATH=~/.local/lib/python3.7/site-packages/torch/lib:$LD_LIBRARY_PATH
- Run the :wq! command to save the file and exit.
- Run the source ~/.bashrc command for the modification to take effect immediately.
Procedure
The following describes the overall procedure for creating a training project with the ResNet-50 for PyTorch template sample. For details about the project information and related pop-up windows, see Procedure.
- Click Ascend Training on the left of the page to create an Ascend training project, as shown in Figure 1.
- On the training project selection page shown in Figure 1, select the ResNet-50 for PyTorch template under CANN Version and Samples.
- Click Next and configure other information about the training project. For details about the parameters, see Creating a Training Project.
- Click Finish. The training project is created.
- View the ResNet-50 for PyTorch template project window as shown in Figure 2.
If error message "Unzip failed. There is problem occurred when unzipping file." is displayed when you create a sample training project on Window, refer to What Do I Do If I Get Error "Unzip failed. There is problem occurred when unzipping file." When Creating a Sample Training Project on Windows? to rectify the fault.
- Find the run_xx.sh file in the directory on the left of the project page, and set the paths of the training and validation image datasets obtained in Environment Setup in the data field of the file. See Figure 3.
The PyTorch ResNet-50 template of MindStudio has preset training parameters in the code of the training script. To customize training parameters, you need to learn the PyTorch framework code.
- Set the run configurations and run the project.
- Choose on the training project page or click on the menu shown in Figure 4 to access the run configuration page.
- Set training parameters, as shown in Figure 5.
Set run configurations of the training project on the right, as described in Table 1.
Table 1 Run configurations of the training project Parameter
Description
Example
Name
Project name (user-defined).
For example: MyTraining3.
The name contains a maximum of 64 characters, starting with a letter and ending with a letter or digit. Only letters, digits, hyphens (-), and underscores (_) are allowed.
Run Mode
Run mode.
Local Run
Deployment
Run configurations.
You can use the Deployment function to synchronize the files and folders in a specified project to a specified directory on a remote device. For details, see Deployment.
In this example, Run Mode is set to Local Run. Therefore, this parameter is not displayed.
Executable
Entry point file of the training project.
For example: run_1p.sh.
-
Command Arguments
Command-line arguments for training. This parameter is optional.
Set this parameter as required.
Environment Variables
Environment variables of the training project. This parameter is optional.
Set this parameter as required.
- Click OK, and the training project information is created.
- Choose on the project page or click the button shown in Figure 6 to perform training.
- After the training is complete, the generated model file is stored in the /result directory of the project file.






