HFMG Algorithm

This algorithm applies to PTQ.

Histogram Feature Map Glutton (HFMG) records the activation distributions via histograms and searches for the optimal quantization clipping bounds. The principles are as follows:

  1. Create a histogram based on the input activations.
  2. If there is more than one batch, create a histogram for each batch of activations and merge the histograms, as shown in Figure 1.
  3. Search for the activation clipping bounds based on the resultant histogram, as shown in Figure 2.

This algorithm reads data from the memory and is more memory-saving compared with IFMR Algorithm. This algorithm and the IFMR algorithm cannot be configured at the same layer.

The create_quant_config API uses the IFMR algorithm by default. To use the HFMG algorithm, you can configure the simplified configuration file through the config_defination parameter of the create_quant_config API.

Figure 1 Histogram merging
Figure 2 Searching for the optimal clipping bounds

If the optimal quantization effect fails to be achieved, modify the number of bins of the histogram and select the near-optimal set of parameters as the quantization results. In the HFMG algorithm, num_of_bins is used to adjust the number of bins in the histogram. For details about the parameter description, see HFMGQuantize in the simplified PTQ configuration file. A larger value of num_of_bins usually leads to better distribution fitting of the histogram and better quantization effect, but it also incurs longer PTQ time.