Automated Porting

This section describes how to use the porting tool to automatically port the TensorFlow 1.15 network to the Ascend platform.

About the Porting Tool

  • Function Overview

    The Ascend platform provides a porting tool targeting at TensorFlow 1.15. AI algorithm engineers can use the tool to analyze the support of TensorFlow and Horovod Python APIs on the Ascend AI Processor, and automatically port native TensorFlow training scripts to those supported by the Ascend AI Processor. For APIs unportable by the tool, modify your training scripts according to the tool report.

  • How to Obtain
    • After the TF Adapter package is installed, the porting tool is stored in the ${TFPLUGIN_INSTALL_PATH}/npu_bridge/convert_tf2npu/ directory. ${TFPLUGIN_INSTALL_PATH} indicates the installation path of the TF Adapter package.
    • You can also obtain the convert_tf2npu folder from the Ascend Gitee Repo and upload the folder to any directory in your Linux or Windows environment.
  • [Restrictions] You should consider the following restrictions on your original training script before using the tool.
    1. The original script can run on the GPU or CPU for accuracy convergence.
    2. The original script must be developed using official TensorFlow 1.15 APIs and official Horovod APIs. Otherwise, the porting tool cannot port the script. You can refer to the following cases.
      1. Native Keras APIs are not supported except for TensorFlow Keras APIs.
      2. CuPy APIs are not supported. It does not promise a successful execution on the Ascend AI Processor even if the original script can run properly on the GPU.
    3. It is recommended that the TensorFlow and Horovod modules in the original script be referenced as follows. Otherwise, an accurate porting report cannot be generated. (This does not affect the script porting.)
      1
      2
      3
      import tensorflow as tf
      import tensorflow.compat.v1 as tf
      import horovod.tensorflow as hvd
      
    4. Currently, the loss scaling function of tf.keras and native Keras APIs is not supported after porting.
    5. For details about other constraints, see Restrictions.

Prerequisites

Before model porting to the Ascend AI Processor, prepare a training model developed on TensorFlow 1.15 and a matched dataset, and run the model on the GPU or CPU to test if the accuracy is converged as expected. In addition, record the accuracy and performance specifications for comparison on the Ascend AI Processor later on.

Procedure

  1. Install dependencies.
    1
    2
    3
    4
    5
    pip3 install pandas==1.3.5
    pip3 install xlrd==1.2.0
    pip3 install openpyxl
    pip3 install tkintertable
    pip3 install google_pasta
    
  2. Perform script scanning and automated porting.
    This tool supports script porting in the Linux or Windows environment.
    • The following applies to the Linux environment:

      Go to the ${TFPLUGIN_INSTALL_PATH}/npu_bridge/convert_tf2npu/ directory where the porting tool is located.

      ${TFPLUGIN_INSTALL_PATH} indicates the installation path of the TF Adapter package. You can run the following command to complete script scanning and automated porting at the same time:

      1
      python3 main.py -i /root/models/official/resnet
      

      main.py is the entry script of the tool. The following table describes the options.

      Table 1 Command-line options

      Option

      Description

      Required (Yes/No)

      -i

      Path of the training script to be ported, which must be a folder path.

      NOTE:
      • The tool scans and ports only the .py files in the folder specified by the -i option.
      • If the original scripts are stored in different directories, you are advised to arrange them in the same directory or run the porting commands in sequence in each directory.

      Yes

      -o

      Path of the ported script. The path cannot be a subdirectory of the original script path.

      Optional. If it is not specified, the current path is used by default, for example, output_npu_20210401150929/xxx_npu_20210401150929.

      No

      -r

      Path of the porting report. The path cannot be a subdirectory of the original script path.

      Optional. If it is not specified, the current path is used by default, for example, report_npu_20210401150929.

      No

      -m

      Python execution entry point file.

      If the tf.keras/hvd API is used and the script does not contain the main function, NPU resource initialization and NPU training configuration cannot be performed as the porting tool cannot identify the entry point function.

      In that case, you need to use -m to specify the entry point file for Python execution, so that the tool can completely port the user script for subsequent training.

      Example: -m /root/models/xxx.py

      No

      -d

      If the original script supports distributed training, you need to specify the distribution policy used by the original script so that the tool can automatically port the distributed script. Value:

      • tf_strategy: The original script uses the tf.distribute.Strategy distribution policy.
      • horovod: The original script uses the Horovod distribution policy.

      Currently, sess.run distributed scripts cannot be automatically ported. After using the tool for automated porting, manual modifications are required based on How Do I Reconstruct the Sess.run Distributed Script After Automated Porting?.

      Yes for distributed training

      python3 main.py -h: displays the help information of the porting tool.

    • The following applies to the Windows environment:
      python3 main_win.py

      Perform operations as prompted.

  3. During the porting, check for the following information, which indicates related files are being scanned for script porting.
    Figure 1 Porting information
  4. After the porting is complete, check the resultant script and porting report.
    Figure 2 Porting completion information

(Optional) Follow-up Procedure

The Ascend platform provides functions such as function debugging and performance/accuracy tuning. After automated porting, you can enable related functions by configuring the following sessions. For details about the parameters, see Session Configuration.
  1. Check whether init_resource exists in the ported script.
    • If it exists, refer to the following example to pass session_config to the init_resource function. Note that only the configuration options supported in initialize_system can be configured in config of the init_resource function. To configure other functions, add them to the run configuration. For details, see 2.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      if __name__ == '__main__':
        # Add allow_soft_placement=True for the session configurations to allow TensorFlow to automatically allocate devices.
        session_config = tf.ConfigProto(allow_soft_placement=True)
        # Add an NPU optimizer named NpuOptimizer. During network compilation, the NPU traverses only the session configurations under NpuOptimizer.
        custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
        custom_op.name = "NpuOptimizer"
        # Configure session parameters.
        custom_op.parameter_map["profiling_mode"].b = True
        ... ...
      
        (npu_sess, npu_shutdown) = init_resource(config=session_config)
        tf.app.run()
        shutdown_resource(npu_sess, npu_shutdown)
        close_session(npu_sess)
      
    • If it does not exist, go to the next step.
  2. Add related session configuration to the run configuration.
    • For Estimator scripts, search for npu_run_config_init in the ported script, find the run configuration function (such as run_config in the example), and add related session parameters to the run configuration, such as the aoe_mode parameter in the following example:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      session_config = tf.ConfigProto(allow_soft_placement=True)
      # Add an NPU optimizer named NpuOptimizer. During network compilation, the NPU traverses only the session configurations under NpuOptimizer.
      custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
      custom_op.name = 'NpuOptimizer'
      # Configure session parameters.
      custom_op.parameter_map["aoe_mode"].s = tf.compat.as_bytes("2")
      
      run_config = tf.estimator.RunConfig(
        train_distribute=distribution_strategy,
        session_config=session_config,
        save_checkpoints_secs=60*60*24)
      
      classifier = tf.estimator.Estimator(
        model_fn=model_function, model_dir=flags_obj.model_dir, config=npu_run_config_init(run_config=run_config))
      
    • For sess.run scripts, search for npu_config_proto in the ported script, find the run configuration function (such as session_config in the example), and add related session parameters to the run configuration, such as the aoe_mode parameter in the following example:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      session_config = tf.ConfigProto(allow_soft_placement=True)
      # Add an NPU optimizer named NpuOptimizer. During network compilation, the NPU traverses only the session configurations under NpuOptimizer.
      custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
      custom_op.name = 'NpuOptimizer'
      # Configure session parameters.
      custom_op.parameter_map["aoe_mode"].s = tf.compat.as_bytes("2")
      config = npu_config_proto(config_proto=session_config)
      with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        interaction_table.init.run()
      
    • For Keras scripts, search for the set_keras_session_npu_config function in the ported script, find the run configuration function (such as config_proto in the example), and add related session parameters to the run configuration, such as the aoe_mode parameter in the following example:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      import tensorflow as tf
      import tensorflow.python.keras as keras
      from tensorflow.python.keras import backend as K
      from npu_bridge.npu_init import *
      
      config_proto = tf.ConfigProto(allow_soft_placement=True)
      # Add an NPU optimizer named NpuOptimizer. During network compilation, the NPU traverses only the session configurations under NpuOptimizer.
      custom_op = config_proto.graph_options.rewrite_options.custom_optimizers.add()
      custom_op.name = 'NpuOptimizer'
      # Configure session parameters.
      custom_op.parameter_map["aoe_mode"].s = tf.compat.as_bytes("2")
      npu_keras_sess = set_keras_session_npu_config(config=config_proto)
      
      # Preprocess data...
      # Construct a model...
      # Compile the model...
      # Train the model...