NPUOptimizer Constructor

Description

Constructs an object of class NPUOptimizer, which combines the NPUDistributedOptimizer and NPULossScaleOptimizer optimizers. It provides the following functions:
  • Loss scaling: Loss scaling can be enabled during mixed precision training to solve the underflow problem caused by a small float16 representation range.
  • Distributed training: With an NPU distributed training optimizer wrapped from a single-server training optimizer, calculated gradients can be aggregated in single-server single-device, single-server multi-device, and multi-server multi-device networking modes.
  • By changing a computation dependency relationship, a computation operation that does not depend on the last AR (gradient aggregation fragment) is scheduled to be performed in parallel with the last AR, to optimize communication tailing.

Prototype

def __init__(self,opt,loss_scale_manager=None,is_distributed=False,is_loss_scale=False,is_tailing_optimization=False,name=None)

Options

Option

Input/Output

Description

opt

Input

Single-server training optimizer for gradient calculation and weight update.

loss_scale_manager

Input

This option needs to be configured only when is_loss_scale is set to True and the loss scaling function is enabled. This option determines the update mode of loss scaling, including static update and dynamic update.

  • Before creating NPUOptimizer, you can instantiate a FixedLossScaleManager class to set the loss scaling with a static value. For details about the constructor of the FixedLossScaleManager class, see FixedLossScaleManager Constructor.
  • Before creating NPUOptimizer, you can instantiate an ExponentialUpdateLossScaleManager class to dynamically configure loss scaling. For details about the constructor of the ExponentialUpdateLossScaleManager class, see ExponentialUpdateLossScaleManager Constructor.

is_distributed

Input

Distributed training enable.

  • True: enables AllReduce.
  • False (default): disables AllReduce.

is_loss_scale

Input

Loss scaling enable.

  • True: enabled (recommended if mixed precision training is enabled). In this case, the value of loss_scale_manager cannot be None.
  • False (default): disabled.

is_tailing_optimization

Input

Communication hangover optimization enable, for improving training performance. This function takes effect only when is_distributed is set to True.

  • True: enabled.
  • False (default): disabled.

Argument of this option must be the same as that set in NPURunConfig Constructor.

name

Input

Name of the optimizer.

Returns

An object of the NPUOptimizer class

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import tensorflow as tf


from npu_bridge.npu_init import *

# Define a single-server optimizer.
optimizer = LAMBOptimizer(
          learning_rate=learning_rate,
          weight_decay_rate=0.01,
          beta_1=0.9,
          beta_2=0.999,
          epsilon=1e-6,
          exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"])
		
# Enable loss scaling.
  if tf.flags.FLAGS.npu_bert_loss_scale not in [None, -1]:
    if tf.flags.FLAGS.npu_bert_loss_scale == 0:
      loss_scale_manager = ExponentialUpdateLossScaleManager(init_loss_scale=tf.flags.FLAGS.init_loss_scale_value, incr_every_n_steps=1000, decr_every_n_nan_or_inf=2, decr_ratio=0.5)
    elif tf.flags.FLAGS.npu_bert_loss_scale >= 1:
      loss_scale_manager = FixedLossScaleManager(loss_scale=tf.flags.FLAGS.npu_bert_loss_scale)
    else:
      raise ValueError("Invalid loss scale: %d" % tf.flags.FLAGS.npu_bert_loss_scale)
    optimizer = NPUOptimizer(optimizer, loss_scale_manager, is_distributed=tf.flags.FLAGS.distributed, is_loss_scale=True, is_tailing_optimization=True)

# Disable loss scaling.
	else:
    optimizer = NPUOptimizer(optimizer, is_distributed=tf.flags.FLAGS.distributed)