Enabling Iteration Offload In Keras Mode

On the Ascend platform, you can directly use the native Keras API for training. However, the number of iterations per training loop on the Ascend AI Processor is fixed at 1 in each sess.run call. To reduce the number of interactions between the host and devices and shorten the training duration, use the model_to_npu_estimator API to convert the model constructed by using Keras into an NPUEstimator object. Besides, specify the number of iterations per training loop on the Ascend AI Processor per sess.run() call by using the iterations_per_loop parameter in NPURunConfig.

Original TensorFlow code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from tensorflow.python.keras.layers import Input, Dense
from tensorflow.python.keras.models import Model

# This returns a tensor
inputs = Input(shape=(224, 224, 3))
 
# This creates a model that includes
# the Input layer and three Dense layers
keras_model = ResNet50(input_tensor=inputs, weights=None, include_top=True)
keras_model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
 
keras_model.fit_generator(
        train_generator,
        steps_per_epoch=100,
        epochs=10)
Code after porting:
1
2
3
4
5
6
7
8
9
from npu_bridge.npu_init import *

run_config = NPURunConfig(save_checkpoints_steps=2,
                          model_dir=model_path,
                          iterations_per_loop=10)
# Convert the model constructed by using Keras to an NPUEstimator object.
est_resnet = keras_to_npu.model_to_npu_estimator(keras_model=keras_model, config=run_config)
# Perform training.
est_resnet.train(input_fn=lambda: input_fn(), max_steps=1000)

In addition, you need to port the Keras data preprocessing part to input_fn in NPUEstimator. The following is an example. In the following example, Keras reads image data from the folder, automatically labels the data, performs data augmentation operations such as data resize, normalization, and horizontal flip, and finally outputs the data. In Estimator mode, data is preprocessed in the same way as reading data from the file list. The difference is that the file name list needs to be read in advance and each image needs to be labeled to output the label list. The data is output after the same data augmentation operations such as normalization, resize, and horizontal flip.

Original TensorFlow code:

1
2
3
4
5
6
7
8
# Keras reads images from the folder.
train_datagen = ImageDataGenerator(rescale=1./255,
        horizontal_flip=True)
 
train_generator = train_datagen.flow_from_directory('data/',
                                                    target_size=(224, 224, 3),
                                                    batch_size=32,
                                                    class_mode='sparse')

Code after porting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 # The function is used to read the image files corresponding to the file names and resize the image files to a unified size.
 def _parse_function(filename, label):
   image = tf.read_file(filename)
   image = tf.image.decode_image(image)
   image = image / 255.0
   image = tf.image.resize_images(image, [224, 224, 3])
   image = tf.image.random_flip_left_right(image)
   return image, label
 
def input_fn():
    # List of image files. The image list needs to be generated by yourself.
    filenames = tf.constant(["/data/image1.jpg", "/data/image2.jpg", ...])
    # label[i] is the label of the filenames[i] image. The label list needs to be generated by yourself.
    labels = tf.constant([0, 5, ...])
    # Now an element in the dataset is (filename, label).
    dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)).repeat(10)
    # Now an element in the dataset is (image_resized, label).
    dataset = dataset.map(_parse_function)
    # Now an element in the dataset is (image_resized_batch, label_batch).
    dataset = dataset.shuffle().batch(32)
    return dataset

Note that the callback function of Keras cannot be used after being converted to an NPUEstimator object.

Checking Whether iterations_per_loop Takes Effect

After iteration offload is enabled, you can check whether the keyword "Insert op success" exists in the INFO log on the host to determine whether iterations_per_loop takes effect.

You can run the following command to set the log level on the host to INFO. The default output path of INFO logs is $HOME/ascend/log/run/plog/.

export ASCEND_GLOBAL_LOG_LEVEL=1