ULQ Algorithm

This algorithm applies to QAT.

The universal linear quantization (ULQ) algorithm continuously trains quantization factors in the training process to minimize performance and accuracy drop. The algorithm is initialization-sensitive, meaning that it quantizes activations (including the clipping operations) in the initialization phase. ULQuantize is used to control the ULQ algorithm in the quantization configuration. PyTorch is used as an example. For details about the parameters, see Table 1.

  • clip_max_min specifies the initial clipping range, clip_min specifies the lower clipping bound, and clip_max indicates the upper clipping bound. Set them based on the actual range of the activation values. If they are not set, AMCT uses the initial clipping range.
  • fixed_min specifies whether to fix the lower clipping bound to 0 during training. For example, if the connected upstream of the current layer is ReLU, whose range of the activation values is [0, +inf], set fixed_min to true for the activation quantization at this layer to fix its lower clipping bound to 0.