NUQ Algorithm
This algorithm applies to NUQ.
The type of quantization in which the quantization levels are unequal is termed as NUQ. NUQ is performed on top of uniform quantization (which quantizes weights from 32-bit precision to 8-bit precision) by selecting a subset of the quantization steps to represent the quantized values. Quantizing the weights of a model may bring accuracy drop. This algorithm searches for the quantization mode with the minimum loss of accuracy. NUQuantize is used to control the NUQ algorithm in the quantization configuration. PyTorch is used as an example. For details about the parameters, see Table 1.
- num_steps indicates the number of quantization steps. A smaller value indicates a higher compression ratio and a larger accuracy drop.
- num_of_iteration indicates the number of iterations in the search process. Generally, a larger value indicates higher accuracy, but the compute time increases exponentially. You are advised to retain the default value.
Parent topic: PTQ Algorithms