Manual Tuning

If the QAT accuracy does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.

Workflow

If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_retrain_config API call is not as expected, you can tune the configuration parameters until the accuracy meets your requirement.

Run quantization based on the initial config.json file generated by the create_quant_retrain_config API. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to the next step.

If the resultant model of INT8 quantization shows accuracy drop, skip quantizing selected layers by setting the retrain_enable field to false for these layers. Generally, the input and output layers are likely to be quantization-sensitive. Skip quantizing the input and output layers first. You can also try to tune clip_max and clip_min in the quantization configuration file as follows:

{
    "version":1,
    "batch_num":1,
    "layername1":{
        "retrain_enable":true,
        "retrain_data_config":{
            "algo":"ulq_quantize",
            "clip_max":3.0,
            "clip_min":-3.0
        },
        "retrain_weight_config":{
            "algo":"arq_retrain",
            "channel_wise":true
        }
    },
    "layername2":{
        "retrain_enable":true,
        "retrain_data_config":{
            "algo":"ulq_quantize",
            "clip_max":3.0,
            "clip_min":-3.0
        },
        "retrain_weight_config":{
            "algo":"arq_retrain",
            "channel_wise":true
        }
    }
}

Run quantization based on the new configuration. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for QAT and the QAT configuration should be removed.

Quantization Configuration File

If inference based on the config.json QAT configuration file generated by the create_quant_retrain_config call has significant accuracy drop, tune the config.json file by referring to this section until the accuracy is as expected. For details about the content of the file, see Example. Keep the layer names unique in the file.

The following describes the configuration parameters available in the configuration file. Note that Table 8 to Table 10 are available only when you manually tune the quantization configuration file.

**Table 1** version
Description	Version number of the quantization configuration file
Type	Integer
Value	1
Command-Line Options	Currently, only version 1 is available.
Recommended Value	1
Required/Optional	Optional

**Table 2** batch_num
Description	Batch number in the inference phase of QAT
Type	Integer
Value	Greater than 0
Command-Line Options	Defaults to 1. You are advised to keep the calibration dataset size within 50 images. Calculate batch_num based on batch_size as follows: batch_num × batch_size = Calibration dataset size batch_size indicates the number of images per batch.
Recommended Value	1
Required/Optional	Optional

**Table 3** retrain_enable
Description	QAT enable
Type	Boolean
Value	true or false
Command-Line Options	true: on false: off
Recommended Value	true
Required/Optional	Optional

**Table 4** retrain_data_config
Description	Activation quantization configuration
Type	Object
Value	-
Command-Line Options	Includes the following parameters: algo: quantization algorithm select, defaulted to ulq_quantize. clip_max: upper bound of clipping-based quantization, defaulted to be empty. clip_min: lower bound of clipping-based quantization, defaulted to be empty. fixed_min: whether to fix the minimum value of clipping-based quantization to 0, defaulted to be empty. dst_type: bit width select of INT8 or INT4 quantization, defaulted to INT8.
Recommended Value	-
Required/Optional	Optional

**Table 5** retrain_weight_config
Description	Weight quantization configuration
Type	Object
Value	-
Command-Line Options	Includes the following parameters: algo: quantization algorithm select, defaulted to arq_retrain. channel_wise
Recommended Value	-
Required/Optional	Optional

**Table 6** algo
Description	Quantization algorithm
Type	Object
Value	-
Command-Line Options	ulq_quantize: ULQ clipping-based quantization algorithm arq_retrain: ARQ algorithm
Recommended Value	Set to ulq_quantize for activation quantization or arq_retrain for weight quantization.
Required/Optional	Optional

**Table 7** channel_wise
Description	Whether to use different quantization factors for each channel.
Type	Boolean
Value	true or false
Command-Line Options	true: Channels are separately quantized using different quantization factors. false: All channels are quantized altogether using the same quantization factors.
Recommended Value	true
Required/Optional	Optional

**Table 8** fixed_min
Description	Fixed lower bound switch for the activation quantization algorithm
Type	Boolean
Value	true or false
Command-Line Options	true: fixes the lower bound of the activation quantization algorithm at 0. false: does not fix the lower bound of the activation quantization algorithm. If this parameter is not included, AMCT automatically sets the lower bound of the activation quantization algorithm according to the graph structure. If this parameter is included, set this parameter for each layer to be quantized as follows: true if the upstream layer is ReLU; false otherwise.
Recommended Value	Do not include this parameter.
Required/Optional	Optional

**Table 9** clip_max
Description	Upper bound for the activation quantization algorithm
Type	Float
Value	clip_max > 0 Find the maximum max based on the distribution of the activation values at different layers. The recommended value range is [0.3 × max, 1.7 × max].
Command-Line Options	If this parameter is included, the upper bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the upper bound is learned using the IFMR algorithm.
Recommended Value	Do not include this parameter.
Required/Optional	Optional

**Table 10** clip_min
Description	Lower bound for the activation quantization algorithm
Type	Float
Value	clip_min < 0 Find the minimum min based on the distribution of the activation values at different layers. The recommended value range is [0.3 × min, 1.7 × min].
Command-Line Options	If this parameter is included, the lower bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the lower bound is learned using the IFMR algorithm.
Recommended Value	Do not include this parameter.
Required/Optional	Optional

**Table 11** dst_type
Description	Quantization bit width select
Type	String
Value	INT8 (default) or INT4. The current version supports only INT8 quantization.
Command-Line Options	Sets the quantization bit width, INT8 or INT4.
Recommended Value	-
Required/Optional	Optional

Parent topic: Quantization Aware Training