Porting with Estimator

If the original TensorFlow network is constructed based on the Estimator API, see this section to understand the manual porting process.

You are advised to adapt the model training sample provided by Rec SDK TensorFlow to other models. If you use an open source project, compatibility issues may occur when you directly port the corresponding APIs.

About Estimator

The Estimator API is a high-level API of TensorFlow and is introduced in TensorFlow 1.10 released in 2018. This API greatly simplifies the programming process of machine learning. Estimator has many advantages, for example, good support for distribution, simplified model creation, and code sharing between model developers.

Develop your training script with Estimator API as follows.

  1. Data preprocessing: Create an input function input_fn.
  2. Model construction: Create a model function model_fn.
  3. Run configuration: Instantiate Estimator and pass the RunConfig object as the run parameter.
  4. Training: Call Estimator.train() to train your model with a fixed number of steps.

The following describes how to port the Estimator training script for training on the Ascend AI Processor.

Header File Inclusion

To import NPU-related libraries, add this header file reference in related Python files as follows:

1
from npu_bridge.npu_init import *

After the preceding header file is imported, the training script is executed on the Ascend AI Processor by default.

Data Preprocessing

The code snippet is ready-to-use in normal cases. Manual tweaking is required only in the following scenario:

If the original network script relies on dataset.batch(batch_size) to return the dynamic shape, the shape of the last step on the network may be inconsistent with the previous shape because the number of remaining samples in the data flow may be less than the batch size. In this scenario, the dynamic shape compilation process starts. To improve network compilation performance, you are advised to set drop_remainder to True to discard the last several samples in the file and ensure that the shape of each step on the network is the same.
1
  dataset = dataset.batch(batch_size, drop_remainder=True)
Note that during inference, if the inference data volume of the last iteration is less than the batch size, you need to pad the inference data with blank data to the batch size. Failure to do so may lead to an assertion in your script that the number of validation results must be equal to the number of validation samples.
1
 assert num_written_lines == num_actual_predict_examples

Model Building

The code snippet is ready-to-use in normal cases. Manual tweaking is required only in the following scenario:

  • Replace dropout in the original network with the corresponding CANN API for better performance. You must also pay attention to the impact on the accuracy.
    • If tf.nn.dropout exists, modify it as follows:
      1
      layers = npu_ops.dropout()
      
    • If tf.layers.dropout, tf.layers.Dropout, tf.keras.layers.Dropout, tf.keras.layers.SpatialDropout1D, tf.keras.layers.SpatialDropout2D, or tf.keras.layers.SpatialDropout3D exists, add the following header file reference:
      1
      from npu_bridge.estimator.npu import npu_convert_dropout
      
  • Replace gelu in the original network with the corresponding CANN API:
    Original TensorFlow code:
    1
    2
    3
    4
    5
    def gelu(x): 
      cdf = 0.5 * (1.0 + tf.tanh(
         (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))) 
      return x*cdf
    layers = gelu()
    

    Code after porting:

    1
    layers = npu_unary_ops.gelu(x)
    

Run Configuration Setting

TensorFlow uses RunConfig to configure the run parameters. You need to port RunConfig to NPURunConfig. The NPURunConfig class inherits from the RunConfig class. Therefore, you can directly modify a script during porting according to the following example with most parameters unchanged.

Original TensorFlow code:
1
2
3
4
config=tf.estimator.RunConfig(
  model_dir=FLAGS.model_dir, 
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))

Code after porting:

1
2
3
4
5
6
npu_config=NPURunConfig(
  model_dir=FLAGS.model_dir,
  save_checkpoints_steps=FLAGS.save_checkpoints_steps,
  # If tf.device code is used on the original network, add the session configuration allow_soft_placement=True to allow TensorFlow to automatically allocate devices.
  session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False) 
  )

However, some are not allowed by NPURunConfig, including train_distribute, device_fn, protocol, eval_distribute, and experimental_distribute. Remove them if they are used in the original script.

If tf.device code is used in the original network, add the session configuration allow_soft_placement=True to allow TensorFlow to automatically allocate devices.

In addition, some parameters, such as iterations_per_loop and precision_mode, are added to NPURunConfig to improve training performance and precision. For details about the parameters, see "NPURunConfig Constructor" in TF Adapter APIs (1.x).

Creating an Estimator Object

You only need to port an Estimator object of TensorFlow to NPUEstimator that inherits from the Estimator class. Change the API by referring to the following example during porting and keep the parameters unchanged.

Original TensorFlow code:

1
2
3
4
mnist_classifier=tf.estimator.Estimator(
  model_fn=cnn_model_fn,
  config=config,
  model_dir="/tmp/mnist_convnet_model")

Code after porting:

1
2
3
4
5
mnist_classifier=NPUEstimator(
  model_fn=cnn_model_fn,
  config=npu_config,
  model_dir="/tmp/mnist_convnet_model"
  )

Training

When training your model, specify only the inputs. The code snippet is ready-to-use in normal cases.
1
2
3
4
mnist_classifier.train(
  input_fn=train_input_fn,
  steps=20000,
  hooks=[logging_hook])

If an error is reported during the porting and training, rectify the fault by referring to FAQs or contact technical support.