Manual Tuning

If the accuracy after quantization does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.

Tuning Workflow

If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_retrain_config API call is not as expected, you can tune the configuration parameters as follows until the accuracy meets your requirement.

Run quantization based on the initial config.json file generated by the create_quant_retrain_config API call. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to the next step.
Tweak the value of quant_enable to skip quantizing certain layers.
quant_enable is the quantization switch of a specified layer. The value false indicates that the layer will be skipped during quantization; true, otherwise. Removing the layer configuration can also skip the layer.

Quantizing a model can have a negative effect on accuracy. Layers sensitive to quantization will suffer from remarkable error increases once quantized and therefore should be left unquantized. Spot these layers as follows:
1. In a model, the input layer, output layer, and layers with especially fewer parameters are likely to be quantization-sensitive.
2. Use the Model Accuracy Analyzer to compare the output errors between the source model and the quantized model layer-wise (a cosine similarity of at least 0.99, for example) to locate the layers that reduce accuracy the most.
Run quantization based on the new configuration generated in 2. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 4.
Tweak the values of activation_quant_params and weight_quant_params to tune the quantization algorithms and parameters.
For details, see IFMR Algorithm for Activation Quantization and ARQ Algorithm for Weight Quantization.
Run quantization based on the new configuration generated in 4. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for quantization and the quantization configuration should be removed.

Figure 1 Configuration tuning workflow

Quantization Configuration File

If you find that the accuracy of the model quantized based on the initial config_ascend.json file generated by the create_quant_config_ascend API call is not as expected, you can tune the configuration parameters as follows until the accuracy meets your requirement. The following is an example of the file content.

       
        
          
          
            {
    "version":1,
    "activation_offset":true,
    "do_fusion":true,
    "skip_fusion_layers":[],
    "MobilenetV2/Conv/Conv2D":{
        "quant_enable":true,
        "activation_quant_params":{
            "max_percentile":0.999999,
            "min_percentile":0.999999,
            "search_range":[
                0.7,
                1.3
            ],
            "search_step":0.01,
            "act_algo":"ifmr",
            "asymmetric":false
        },
        "weight_quant_params":{
            "wts_algo":"arq_quantize",
            "channel_wise":true
        }
    },
    "MobilenetV2/Conv_1/Conv2D":{
        "quant_enable":true,
        "activation_quant_params":{
            "max_percentile":0.999999,
            "min_percentile":0.999999,
            "search_range":[
                0.7,
                1.3
            ],
            "search_step":0.01,
            "act_algo":"ifmr",
            "asymmetric":false
        },
        "weight_quant_params":{
            "wts_algo":"arq_quantize",
            "channel_wise":true
        }
    }
}

           

         

       
      

Command-Line Options

The following tables describe the parameters in the configuration file.

**Table 1** version
Description	Version number of the quantization configuration file
Type	int
Value	1
Command-Line Options	Currently, only version 1 is available.
Recommended Value	1
Required/Optional	Optional

**Table 2** activation_offset
Description	Symmetric or asymmetric mode select for activation quantization. It is a global configuration parameter. The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.
Specification	bool
Value	true or false
Command-Line Options	true: asymmetric quantization false: symmetric quantization
Recommended	true
Required/Optional	Optional

**Table 3** do_fusion
Description	Fusion switch
Type	bool
Value	true or false
Command-Line Options	true: on false: off For the fusible layers and fusion patterns, see Fusion Support.
Recommended	true
Required/Optional	This function is optional.

**Table 4** skip_fusion_layers
Description	Layers to skip BN fusion
Type	string
Value	Must be names of fusible layers. For the fusible layers and fusion patterns, see Fusion Support.
Command-line options	Sets the layers to skip fusion.
Recommended Value	-
Required/Optional	This function is optional.

**Table 5** layer_config
Description	Quantization configuration of a network layer
Type	object
Value	-
Parameters description:	Includes the following parameters: quant_enable activation_quant_params weight_quant_params
Recommended Configuration	-
Required/Optional	This function is optional.

**Table 6** quant_enable
Description	Quantization enable
"Type"	bool
The options are as follows:	true or false
Command-Line Options	true: on false: off
Recommended Configuration	true
Required/Optional	This function is optional.

**Table 7** activation_quant_params
Description	Activation quantization parameters
"Type"	object
The options are as follows:	-
Parameter Description	Includes the following parameters: max_percentile min_percentile search_range search_step act_algo asymmetric
Recommended Configuration	-
Required/Optional	This function is optional.

**Table 8** weight_quant_params
Description	Weight quantization parameters
Type	object
The options are as follows:	-
Parameters description:	Includes the following parameters in uniform quantization: wts_algo channel_wise
Recommended Configuration	-
Required/Optional	Optional

**Table 9** act_algo
Description	Activation quantization algorithm
"Type"	string
The options are as follows:	ifmr
Parameters description:	Currently, only the IFMR activation quantization algorithm is supported.
Recommended Configuration	-
Required/Optional	This function is optional.

**Table 10** asymmetric
Description	Symmetric quantization or asymmetric quantization select for activation quantization. It is used to select the layer-wise quantization algorithm. The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.
Type	bool
Value	true or false
Command-line options	true: asymmetric quantization false: symmetric quantization
Recommended	true
Required/Optional	This function is optional.

**Table 11** max_percentile
Description	Upper bound for searching for the largest.
"Type"	float
The options are as follows:	(0.5, 1]
Command-line options	For example, given 100 numeric values in descending order, the upper bound 1.0 indicates that the value indexed 0 (100 – 100 x 1.0) is considered as the largest. A larger value indicates that the upper bound for clipping-based quantization is closer to the maximum value of the data to be quantized.
Recommended Configuration	0.999999
Required/Optional	This function is optional.

**Table 12** min_percentile
Description	Lower bound for searching for the smallest.
"Type"	float
The options are as follows:	(0.5, 1]
Command-Line Options	For example, given 100 numeric values in ascending order, the lower bound 1.0 indicates that the value indexed 0 (100 – 100 x 1.0) is considered as the smallest. A larger value indicates that the lower bound for clipping-based quantization is closer to the minimum value of the data to be quantized.
Recommended Configuration	0.999999
Required/Optional	Optional

**Table 13** search_range
Description	Quantization factor search range: [search_range_start, search_range_end]
"Type"	A list of two floats
The options are as follows:	0<search_range_start<search_range_end
Parameters description:	Sets the quantization factor search range. search_range_start: search start search_range_end: search end
Recommended Configuration	[0.7,1.3]
Required/Optional	This function is optional.

**Table 14** search_step
Description	Quantization factor search step
Type	float
Value	(0, (search_range_end-search_range_start)]
Command-Line Options	Sets the fluctuation step of the upper bound for clipping-based quantization. A smaller value indicates a smaller quantization factor search step.
Recommended Configuration	0.01
Required/Optional	This function is optional.

**Table 15** wts_algo
Description	Weight quantization algorithm
"Type"	string
The options are as follows:	arq_quantize
Parameters description:	arq_quantize: basic weight quantization
Recommended Configuration	-
Required/Optional	This function is optional.

**Table 16** channel_wise
Description	Whether to use different quantization factors for each channel
"Type"	bool
The options are as follows:	true or false
Parameters description:	true: Channels are separately quantized using different quantization factors. false: All channels are quantized altogether using the same quantization factors.
Recommended Configuration	true
Required/Optional	This function is optional.

Parent topic: Quantization