ULQ Algorithm for Activation Quantization

This algorithm applies to QAT.

The universal linear quantization (ULQ) algorithm continuously trains quantization factors in the training process to minimize performance and accuracy loss. The algorithm is initialization-sensitive, meaning that it quantizes activations (including the clipping operations) in the initialization phase. In the quantization configuration, ULQuantize is used to control the ULQ algorithm. (The PyTorch framework is used as an example. For details about the parameters, see.)

  • clip_max_min specifies the initial clipping range, clip_min specifies the lower clipping bound, and clip_max indicates the upper clipping bound. Set them based on the actual range of the activation values. If they are not set, AMCT uses the initial clipping range.
  • fixed_min specifies whether to fix the lower clipping bound to 0 during training. For example, if the connected upstream of the current layer is ReLU, whose range of the activation values is [0, +inf], set fixed_min to true for the activation quantization at this layer to fix its lower clipping bound to 0.