nuq_quantize: NUQ algorithm for weight quantization
This algorithm applies to non-uniform PTQ.
The type of quantization in which the quantization levels are unequal is termed as NUQ. NUQ is performed on top of uniform quantization (which quantizes weights from 32-bit precision to 8-bit precision) by selecting a subset of the quantization steps to represent the quantized values. Quantizing the weights of a model may bring accuracy drop. This algorithm searches for the quantization mode with the minimum loss of accuracy. In the quantization configuration, NUQuantize is used to control the NUQ algorithm. (The PyTorch framework is used as an example. For details about the parameters, see.)
- num_steps indicates the number of quantization steps. A smaller value indicates a higher compression ratio and a larger accuracy drop.
- num_of_iteration indicates the number of iterations in the search process. Generally, a larger value indicates higher precision, but the calculation time increases exponentially. The default value is recommended.
Parent topic: PTQ Algorithms