HFMG for Activation Quantization

This algorithm applies to PTQ.

Histogram Feature Map Glutton (HFMG) records the activation distributions via histograms and searches for the optimal quantization clipping bounds. The principles are as follows:

  1. Create a histogram based on the input activations.
  2. If there is more than one batch, create a histogram for each batch of activations and merge the histograms, as shown in Figure 1.
  3. Search for the activation clipping bounds based on the resultant histogram, as shown in Figure 2.

This algorithm reads data from the memory and is more memory-saving compared with ifmr: IFMR algorithm for activation quantization. This algorithm and the IFMR activation quantization algorithm cannot be configured at the same layer.

The create_quant_config API uses the IFMR activation quantization algorithm by default. To use the HFMG activation quantization algorithm, you can only configure the config_defination parameter of the create_quant_config API in the simple configuration file.

Figure 1 Histogram merging
Figure 2 Searching for the optimal clipping bounds

If the optimal quantization effect fails to be achieved, modify the number of bins of the histogram and select the near-optimal set of parameters as the quantization results. For the HFMG algorithm, the num_of_bins parameter is used to adjust the number of bins of the histogram. For details about the parameter description, see the configuration parameter in the simplified configuration file for PTQ > HFMGQuantize. A larger value of num_of_bins usually leads to better distribution fitting of the histogram and better quantization effect, but it also incurs longer PTQ time.