Model Construction
Compile a model the same as the original model. Some code is modified for adaptation to improve compute performance. The sample code in this section shows the modifications.
Defining Model Functions
The following uses the model function constructed based on ImageNet as an example. The related APIs are as follows.
Class or API |
Description |
Location |
|---|---|---|
imagenet_model_fn() |
Model function constructed based on ImageNet. |
official/r1/resnet/imagenet_main.py |
learning_rate_with_decay() |
Learning rate function. When the number of global steps is less than the configured value, the learning rate increases linearly. When the number of global steps is greater than the configured value, the learning rate decreases by phase. |
official/r1/resnet/resnet_run_loop.py |
resnet_model_fn() |
Constructs the EstimatorSpec class, which defines the model that is run using Estimator. |
official/r1/resnet/resnet_run_loop.py |
ImagenetModel() |
Inherited from Model in the resnet_model module. It specifies the network scale, version, number of classes, convolution parameters, and pooling parameters of the ResNet model that is based on ImageNet. |
official/r1/resnet/imagenet_main.py |
__call__() |
Adds more operations to classify input images, including: 1. performing NHWC to NCHW conversion to accelerate GPU computing; 2. performing the first convolution operation; 3. determining whether to perform batch normalization based on the ResNet version; 4. performing the first pooling; 5. performing block stacking; 6. computing the mean values of the input images; 7. adding fully-connected layers. |
official/r1/resnet/resnet_model.py |
Performance Improvement
- Import the following header files to the official/r1/resnet/resnet_run_loop.py file:
from npu_bridge.hccl import hccl_ops
- Check the data type of the input features or images.Tweak: resnet_model_fn() in official/r1/resnet/resnet_run_loop.py (The changes are in bold.)
1 2 3 4 5 6 7 8 9
############# npu modify begin ############# # Check whether the data type of input features or images is consistent with the data type used for computing. if features.dtype != dtype: # Change the data type of the features to dtype. features = tf.cast(features, dtype) ############## npu modify end ############### # The source code is as follows. # assert features.dtype == dtype
- Use the float32 type for labels to improve accuracy.Tweak: resnet_model_fn() in official/r1/resnet/resnet_run_loop.py (The changes are in bold.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
############## npu modify begin ############# # Use the float32 type for labels to improve accuracy. accuracy = tf.compat.v1.metrics.accuracy(tf.cast(labels, tf.float32), predictions['classes']) ############## npu modify end ############### # The accuracy computation code is as follows. # accuracy = tf.compat.v1.metrics.accuracy(labels, predictions['classes']) accuracy_top_5 = tf.compat.v1.metrics.mean( tf.nn.in_top_k(predictions=logits, targets=labels, k=5, name='top_5_op')) ############## npu modify begin ############# # Calculate accuracy during distributed training. rank_size = int(os.getenv('RANK_SIZE')) newaccuracy = (hccl_ops.allreduce(accuracy[0], "sum") / rank_size, accuracy[1]) newaccuracy_top_5 = (hccl_ops.allreduce(accuracy_top_5[0], "sum") / rank_size, accuracy_top_5[1]) metrics = {'accuracy': newaccuracy, 'accuracy_top_5': newaccuracy_top_5} ############## npu modify end ############# # Metrics in the source code is as follows. # metrics = {'accuracy': accuracy, # 'accuracy_top_5': accuracy_top_5}
- Replace the max_pooling2d operator with max_pool_with_argmax for better compute performance.Tweak: __call__() in official/r1/resnet/resnet_model.py (The changes are in bold.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# Determine whether to perform the first pooling. if self.first_pool_size: ############## npu modify begin ############# # Replace max_pooling2d with max_pool_with_argmax for better performance. inputs,argmax = tf.compat.v1.nn.max_pool_with_argmax( input=inputs, ksize=(1,self.first_pool_size,self.first_pool_size,1), strides=(1,self.first_pool_stride,self.first_pool_stride,1), padding='SAME', data_format='NCHW' if self.data_format == 'channels_first' else 'NHWC') ############## npu modify end ############### # The code uses the max_pooling2d() API for pooling. # inputs = tf.compat.v1.layers.max_pooling2d( # inputs=inputs, pool_size=self.first_pool_size, # strides=self.first_pool_stride, padding='SAME', # data_format=self.data_format) inputs = tf.identity(inputs, 'initial_max_pool')
Configuring Distributed Training
- Import the following header file to the official/r1/resnet/resnet_run_loop.py file:
from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
- Add the distributed training optimizer NPUDistributedOptimizer.
Tweak: resnet_model_fn() in official/r1/resnet/resnet_run_loop.py (The changes are in bold.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
if flags.FLAGS.enable_lars: optimizer = tf.contrib.opt.LARSOptimizer( learning_rate, momentum=momentum, weight_decay=weight_decay, skip_list=['batch_normalization', 'bias']) else: optimizer = tf.compat.v1.train.MomentumOptimizer( learning_rate=learning_rate, momentum=momentum ) ############## npu modify begin ############# # Use the distributed training optimizer to encapsulate the single-server optimizer to support distributed training. # Add the following content to the source code. optimizer = NPUDistributedOptimizer(optimizer) ############## npu modify end ############### fp16_implementation = getattr(flags.FLAGS, 'fp16_implementation', None) if fp16_implementation == 'graph_rewrite': optimizer = ( tf.compat.v1.train.experimental.enable_mixed_precision_graph_rewrite( optimizer, loss_scale=loss_scale))