ARQ Algorithm

Adaptive range quantization (ARQ) is an algorithm for quantizing weights. This algorithm provides two quantization modes: channel-wise and non-channel-wise, depending on whether separate quantization factors are applied on each channel (specified by using channel_wise, as described in Table channel_wise). The following figure shows the quantization principles.

Figure 1 Non-channel-wise quantization

Figure 2 channel_wise quantization

In the quantization configuration, the channel_wise parameter of ARQuantize is used to select the quantization mode. (The PyTorch framework is used as an example. For details about the parameters, see.)

When channel_wise is set to False, all filters analyze data distribution together and perform quantization. All different channels at a layer share a quantization factor.
When channel_wise is set to True, the quantization mode is selected. Each filter performs data analysis and quantization independently. Each channel at the same layer has an independent quantization factor.

Generally, each filter is quantized independently, that is, quantization is performed, and quantization precision is high. If the data amount of each filter is small, the quantization effect is poor. In this case, quantization is recommended.

Note that layers InnerProduct and AVE Pooling are channel-irrelevant. When channel_wise is set to True for these layers, an error is reported.

Parent topic: PTQ Algorithms