Porting with sess.run

If the original TensorFlow network is constructed based on the sess.run API, see this section to understand the manual porting process.

About sess.run

As a low-level API of TensorFlow, sess.run appears more flexible than Estimator. On the flip side, using it for model implementation could be complex.

Develop your training script with the sess.run API as follows:

  1. Preprocess data.
  2. Construct a model, calculate the loss, and update the gradient.
  3. Create a session and initialize resources.
  4. Start training.

To perform training on Ascend AI Processor, the following guides you to port your training script developed with sess.run.

Header File

To import NPU-related libraries, add this header file reference in related Python files as follows.

1
from npu_bridge.npu_init import *

After the preceding header file is imported, the training script is executed on the Ascend AI Processor by default.

Data Preprocessing

The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenario:

If the original network script relies on dataset.batch(batch_size) to return the dynamic shape, the shape of the last step on the network may be inconsistent with the previous shape because the number of remaining samples in the data flow may be less than the batch size. In this scenario, the dynamic shape compilation process starts. To improve network compilation performance, you are advised to set drop_remainder to True to discard the last several samples in the file and ensure that the shape of each step on the network is the same.
1
  dataset = dataset.batch(batch_size, drop_remainder=True)
Note that during inference, if the inference data volume of the last iteration is less than batch_size, you need to pad the inference data with blank data to batch_size. Failure to do so may lead to an assertion in your script that the number of validation results must be equal to the number of validation samples.
1
 assert num_written_lines == num_actual_predict_examples

Model Construction, Loss Calculation, and Gradient Update

The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenarios:

  • Replace dropout in the original network with the corresponding CANN API for better performance. You must also pay attention to the impact on the accuracy.
    • If tf.nn.dropout exists, modify it as follows:
      1
      layers = npu_ops.dropout()
      
    • If tf.layers.dropout, tf.layers.Dropout, tf.keras.layers.Dropout, tf.keras.layers.SpatialDropout1D, tf.keras.layers.SpatialDropout2D, or tf.keras.layers.SpatialDropout3D exists, add the following header file reference:
      1
      from npu_bridge.estimator.npu import npu_convert_dropout
      
  • Replace gelu in the original network with the corresponding CANN API to achieve optimal performance.
    Original TensorFlow code:
    1
    2
    3
    4
    5
    def gelu(x): 
      cdf = 0.5 * (1.0 + tf.tanh(
         (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))) 
      return x*cdf
    layers = gelu()
    

    Code after porting:

    1
    layers = npu_unary_ops.gelu(x)
    

Session Creation and Resource Initialization

When running your training script on Ascend AI Processor by using sess.run, note the following configurations:

  • The following configuration option is deactivated by default and should remain deactivated:

    rewrite_options.disable_model_pruning

  • The following configuration options are activated by default and should remain activated:
    • rewrite_options.function_optimization
    • rewrite_options.constant_folding
    • rewrite_options.shape_optimization
    • rewrite_options.arithmetic_optimization
    • rewrite_options.loop_optimization
    • rewrite_options.dependency_optimization
    • rewrite_options.layout_optimizer
  • The following configuration option is enabled by default and should be disabled explicitly:
    • rewrite_options.remapping
    • rewrite_options.memory_optimization
  • If tf.device code is used on the original network, add the session configuration allow_soft_placement=True to allow TensorFlow to automatically allocate devices.

Original TensorFlow code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Construct an iterator.
iterator=Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)

# Obtain the batch data.
next_batch=iterator.get_next()

# Initialize the iterator.
training_init_op=iterator.make_initializer(train_dataset)
 
# Initialize the variables.
init=tf.global_variables_initializer()
sess=tf.Session()
sess.run(init)
 
# Get the number of training/validation steps per epoch.
train_batches_per_epoch=int(np.floor(train_size/batch_size))

Code after porting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Construct an iterator.
iterator=Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)

# Obtain the batch data.
next_batch=iterator.get_next()

# Initialize the iterator.
training_init_op=iterator.make_initializer(train_dataset)
 
# Initialize the variables.
init=tf.global_variables_initializer()

# Add allow_soft_placement=True for the session configurations to allow TensorFlow to automatically allocate devices.
config = tf.ConfigProto(allow_soft_placement=True)
# Add an NPU optimizer named NpuOptimizer. During network compilation, the NPU traverses only the session configurations under NpuOptimizer.
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
# Explicitly disable the remapping and memory_optimization functions of TensorFlow to avoid conflicts with the functions of the NPU.
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF  # Explicitly disable the function.
config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF  # Explicitly disable the function.
sess = tf.Session(config=config)
sess.run(init)
 
# Get the number of training/validation steps per epoch.
train_batches_per_epoch=int(np.floor(train_size/batch_size))

The Ascend platform supports all native functions of tf.Session.

It also allows you to enable functions such as automatic mixed precision. For details, see Session Configuration.

Training

The code snippet is ready to use. See the following example.

1
2
3
4
5
6
7
8
9
# Start epochs.
for epoch in range(num_epochs):
  ##Initialize iterator with the training dataset
  sess.run(training_init_op)
  for step in range(train_batches_per_epoch):  
    #get next batch of data
    img_batch,label_batch=sess.run(next_batch)
    #run the training op
    _,train_loss = sess.run([train_op, loss],feed_dict={x:img_batch, y_:label_batch, is_training:True})

However, you need an explicit call to sess.close() in your ported script if you create a session without a with block, for example, you define a session object as a class member.

1
2
3
sess = tf.Session(config=config)
sess.run(...)
sess.close()

That is because the GEOP destructor function is called in the close method of tf.Session. If you use a with block that calls __exit__ to close the session automatically, there is no need to call sess.close().

1
2
with tf.Session(config=config) as sess:
    sess.run(...)

In other cases, for example, taking a session object as a user-defined class member, you should explicitly call sess.close() to exit the session.