Porting with Estimator

If the original TensorFlow network is constructed based on the Estimator API, see this section to understand the manual porting process.

About Estimator

The Estimator API is a high-level API of TensorFlow, which can greatly simplify the programming process of machine learning. Estimator has many advantages, for example, good support for distribution, simplified model creation, and code sharing between model developers.

Develop your training script with the Estimator API as follows:

  1. Create an input function input_fn during data preprocessing.
  2. Create a model function model_fn during model construction.
  3. Instantiate the Estimator and pass the RunConfig object as the run parameter during run configuration setting.
  4. Call Estimator.train() to train your model with a fixed number of steps that you set.

To perform training on the Ascend AI Processor, the following guides you to port your training script developed with Estimator.

Header File

To import NPU-related libraries, add this header file reference in related Python files as follows.

1
from npu_bridge.npu_init import *

After the preceding header file is imported, the training script is executed on the Ascend AI Processor by default.

Data Preprocessing

The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenario:

If the original network script relies on dataset.batch(batch_size) to return the dynamic shape, the shape of the last step on the network may be inconsistent with the previous shape because the number of remaining samples in the data flow may be less than the batch size. In this scenario, the dynamic shape compilation process starts. To improve network compilation performance, you are advised to set drop_remainder to True to discard the last several samples in the file and ensure that the shape of each step on the network is the same.
1
  dataset = dataset.batch(batch_size, drop_remainder=True)
Note that during inference, if the inference data volume of the last iteration is less than batch_size, you need to pad the inference data with blank data to batch_size. Failure to do so may lead to an assertion in your script that the number of validation results must be equal to the number of validation samples.
1
 assert num_written_lines == num_actual_predict_examples

Model Construction

The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenarios:

  • Replace dropout in the original network with the corresponding CANN API for better performance. You must also pay attention to the impact on the accuracy.
    • If tf.nn.dropout exists, modify it as follows:
      1
      layers = npu_ops.dropout()
      
    • If tf.layers.dropout, tf.layers.Dropout, tf.keras.layers.Dropout, tf.keras.layers.SpatialDropout1D, tf.keras.layers.SpatialDropout2D, or tf.keras.layers.SpatialDropout3D exists, add the following header file reference:
      1
      from npu_bridge.estimator.npu import npu_convert_dropout
      
  • Replace gelu in the original network with the corresponding CANN API to achieve optimal performance.
    Original TensorFlow code:
    1
    2
    3
    4
    5
    def gelu(x): 
      cdf = 0.5 * (1.0 + tf.tanh(
         (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))) 
      return x*cdf
    layers = gelu()
    

    Code after porting:

    1
    layers = npu_unary_ops.gelu(x)
    

Run Configuration Setting

TensorFlow uses RunConfig to configure the run parameters. You need to port RunConfig to NPURunConfig. The NPURunConfig class inherits from the RunConfig class. Therefore, you can directly modify a script during porting according to the following example with most parameters unchanged.

Original TensorFlow code:
1
2
3
4
config=tf.estimator.RunConfig(
  model_dir=FLAGS.model_dir, 
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))

Code after porting:

1
2
3
4
5
6
npu_config=NPURunConfig(
  model_dir=FLAGS.model_dir,
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  # If tf.device code is used on the original network, add the session configuration allow_soft_placement=True to allow TensorFlow to automatically allocate devices.
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False) 
  )

However, some are not allowed by NPURunConfig, including train_distribute, device_fn, protocol, eval_distribute, and experimental_distribute. Remove them if they are used in the original script.

If tf.device code is used on the original network, add the session configuration allow_soft_placement=True to allow TensorFlow to automatically allocate devices.

In addition, some parameters, such as iterations_per_loop and precision_mode, are added to NPURunConfig to improve training performance and accuracy. For details about the parameters, see NPURunConfig Constructor.

Creating an Estimator Object

You only need to port an Estimator object of TensorFlow to NPUEstimator that inherits from the Estimator class. Change the API by referring to the following example during porting and keep the parameters unchanged.

Original TensorFlow code:

1
2
3
4
mnist_classifier=tf.estimator.Estimator(
  model_fn=cnn_model_fn,
  config=config,
  model_dir="/tmp/mnist_convnet_model")

Code after porting:

1
2
3
4
5
mnist_classifier=NPUEstimator(
  model_fn=cnn_model_fn,
  config=npu_config,
  model_dir="/tmp/mnist_convnet_model"
  )

Training

When training your model, specify only the inputs. The code snippet is ready to use in normal cases.
1
2
3
4
mnist_classifier.train(
  input_fn=train_input_fn,
  steps=20000,
  hooks=[logging_hook])