Distillation Algorithm

The idea of distillation is to use the quantization model as the student model and the source model as the teacher model. By guiding the quantization model to imitate the floating-point model, better accuracy results can be obtained. Distillation requires only a small number of unlabeled datasets, that is, good accuracy results can be achieved in a short quantization time.

Distillation steps:

Quantize the source model to obtain a quantized model with the same structure as the floating-point model.
Divide cascaded quantization layers into a distillation unit.
Use the output of the floating-point distillation module as "soft label" to fine-tune the distillation module.
Distill all modules to obtain a quantized model with better accuracy.

Figure 1 Distillation diagram

Parent topic: Compression Algorithms