Replacing LossScaleOptimizer

Precautions

  • For the Atlas A3 training products / Atlas A3 inference products , Atlas A2 training products / Atlas A2 inference products , the overflow/underflow mode of floating-point computation uses the INF/NaN mode by default. Therefore, you can skip this step. If you have manually called the set_device_sat_mode API to change the overflow/underflow mode to the saturation mode, you need to port scripts by referring to this section. Note that the saturation mode is only compatible with earlier versions and will not be evolved in the future. In addition, the compute in this mode may be inaccurate.
  • For the Atlas training products , skip this step if your script does not involve the use of LossScaleOptimizer. Otherwise, port the script by referring to this section.

Description

Generally, LossScaleOptimizer is used to prevent numeric underflow in mixed precision mode. In the saturation mode, as a floating-point range error on the NPU is reported as a global error instead of returning Inf or NaN, you are advised to use npu.train.optimizer.NpuLossScaleOptimizer provided by the NPU to obtain the correct overflow/underflow detection result.

The usage of npu.train.optimizer.NpuLossScaleOptimizer is the same as that of tf.keras.mixed_precision.LossScaleOptimizer.

Replace the occurrences of tf.keras.mixed_precision.LossScaleOptimizer in your script with npu.train.optimizer.NpuLossScaleOptimizer directly. If your script uses a different form of LossScaleOptimizer, import it to tf.keras.mixed_precision.LossScaleOptimizer and validate the functionality and quality before replacement.