TensorFlow Training

Environment Setup

Refer to Installing the Framework Plugin Package and Installing TensorFlow in the CANN Software Installation Guide to install TFPlugin and TensorFlow 1.15.0.

Prepare a dataset and upload it to any directory in the training environment. For details, see TensorFlow Model > Training > Preparing a Dataset.

Environment Variable Configuration

  1. Log in as the running user, run the vi ~/.bashrc command in any directory to open the .bashrc file, and append the following content to the file (the default installation path of a non-root user is used as an example):
    # Ascend-CANN-Toolkit environment variable. Change it to the actual path.
    source ~/Ascend/ascend-toolkit/set_env.sh
    
    # TFPlugin environment variable. Change it to the actual path.
    source ~/Ascend/tfplugin/set_env.sh
  2. Run the :wq! command to save the file and exit.
  3. Run the source ~/.bashrc command for the modification to take effect immediately.

Procedure

The following describes the overall procedure for creating a training project with the ResNet-50 for TensorFlow template sample. For details about the project information and related pop-up windows, see Procedure.

  1. Click Ascend Training on the left of the page to create an Ascend training project, as shown in Figure 1.
    Figure 1 Project creation page
  2. On the training project selection page shown in Figure 1, select the ResNet-50 for TensorFlow template under CANN Version and Samples. Then click Finish.
  3. Click Next and configure other information about the training project. For details about the parameters, see Creating a Training Project.
  4. Click Finish. The training project is created.

    If this is your first-time creation, the tool will automatically download the sample project template. Ensure that your device is connected to the network; or subsequent operations cannot be performed.

  5. View the ResNet-50 for TensorFlow template project window as shown in Figure 2.
    Figure 2 Template project window

    If error message "Unzip failed. There is problem occurred when unzipping file." is displayed when you create a sample training project on Window, refer to What Do I Do If I Get Error "Unzip failed. There is problem occurred when unzipping file." When Creating a Sample Training Project on Windows? to rectify the fault.

  6. In the src > configs directory on the left of the project page, find the res50_256bs_xx.py file and set the path of the dataset obtained in Environment Setup in the data_url field of the res50_256bs_xx.py file. See Figure 3.
    Figure 3 Setting the dataset path

    The TensorFlow ResNet-50 template in MindStudio has preset training parameters in the code of the training script. To customize the training parameters, you need to learn the TensorFlow framework code.

  7. Set the run configurations and run the project.
    1. Choose Run > Edit Configurations... on the training project page or click Edit Configurations... on the menu shown in Figure 4 to access the run configuration page.
      Figure 4 Shortcut to the run configuration page
    2. Set training parameters, as shown in Figure 5.
      Figure 5 Run configuration page

      Set run configurations of the training project on the right, as described in Table 1.

      Table 1 Run configurations of the training project

      Parameter

      Description

      Example

      Name

      Project name (user-defined).

      For example: MyTraining2.

      The name contains a maximum of 64 characters, starting with a letter and ending with a letter or digit. Only letters, digits, hyphens (-), and underscores (_) are allowed.

      Run Mode

      Run mode.

      Local Run

      Deployment

      Run configurations.

      You can use the Deployment function to synchronize the files and folders in a specified project to a specified directory on a remote device. For details, see Deployment.

      In this example, Run Mode is set to Local Run. Therefore, this parameter is not displayed.

      Executable

      Entry point file of the training project.

      For example: train_1p.sh.

      -

      Command Arguments

      Command-line arguments for training. This parameter is optional.

      Set this parameter as required.

      Environment Variables

      Environment variables of the training project. This parameter is optional.

      Set this parameter as required.

    3. Click OK, and the training project information is created.
    4. Choose Run > Run 'MyTraining2' on the project page or click the button shown in Figure 6 to perform training.
      Figure 6 Performing training using a shortcut
      Figure 7 shows the training process.
      Figure 6 Training process display
    5. After the training is complete, the generated model file is stored in the /scripts/d_solution/ckptx_{time} directory of the project file.
  8. For details about other operations, see the Training tab in the ResNet-50 model page at ModelZoo in the Ascend Community.