ARQ Algorithm

Adaptive range quantization (ARQ) is an algorithm for quantizing weights. This algorithm provides two quantization modes: channel-wise and non-channel-wise, depending on whether separate quantization factors are applied on each channel. PyTorch is used as an example. For details about the parameters, see Table 20. The following figure shows the quantization principles.

Figure 1 Non-channel-wise quantization

Figure 2 Channel-wise quantization

channel_wise in ARQuantize is used to select the quantization mode. PyTorch is used as an example. For details about the parameters, see Table 1.

If channel_wise is set to False, the quantization mode described in Figure 1 is selected. Data analysis is performed on all filters at once and the same quantization factor is shared across the channels at the same layer.
If channel_wise is set to True, the quantization mode described in Figure 2 is selected. Separate data analysis is performed on each filter and each channel at the same layer has its own quantization factor.

Figure 2 generally yields higher accuracy. However, the quantization mode described in Figure 1 is recommended when the data amount of each filter is small.

Note that layers InnerProduct and AVE Pooling are channel-irrelevant. When channel_wise is set to True for these layers, an error is reported.

Parent topic: PTQ Algorithms