Manual Tuning

If the QAT accuracy does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.

Tuning Workflow

If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_retrain_config API call is not as expected, you can tune the configuration parameters as follows until the accuracy meets your requirement.
  1. Run quantization based on the initial config.json file generated by the create_quant_retrain_config API call. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 2.
  2. If the result model of INT8 quantization shows accuracy drop, skip quantizing selected layers by setting the retrain_enable field to false for these layers. Generally, the input and output layers are likely to be quantization-sensitive. Skip quantizing the input and output layers first. You can also try to tune clip_max and clip_min in the quantization configuration file as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    {
        "version":1,
        "batch_num":1,
        "inference/Conv2D":{
            "retrain_enable":true,
            "retrain_data_config":{
                "algo":"ulq_quantize",
                "clip_max":3.0,
                "clip_min":-3.0
            },
            "retrain_weight_config":{
                "algo":"arq_retrain",
                "channel_wise":true
            }
        },
        "inference/Conv2D_1":{
            "retrain_enable":true,
            "retrain_data_config":{
                "algo":"ulq_quantize",
                "clip_max":3.0,
                "clip_min":-3.0
            },
            "retrain_weight_config":{
                "algo":"arq_retrain",
                "channel_wise":true
            }
        }
    }
    
  3. Run quantization based on the new configuration. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for QAT and the QAT configuration should be removed.

Quantization Configuration File

If inference based on the config.json QAT configuration file generated by the create_quant_retrain_config call has significant accuracy drop, tune the config.json file until the accuracy is as expected. The following is an example of the file content. Keep the layer names unique in the JSON file.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "version":1,
    "batch_num":1,
    "conv1":{
        "retrain_enable":true,
        "retrain_data_config":{
            "algo":"ulq_quantize",
            "dst_type":"INT8"
        },
        "retrain_weight_config":{
            "algo":"arq_retrain",
            "channel_wise":true,
            "dst_type":"INT8"
        }
    },
    "conv2_1/expand":{
        "retrain_enable":true,
        "retrain_data_config":{
            "algo":"ulq_quantize",
            "dst_type":"INT8"
        },
        "retrain_weight_config":{
            "algo":"arq_retrain",
            "channel_wise":true,
            "dst_type":"INT8"
        }
    }
}

Command-line options

The following tables describe the parameters in the configuration file.

Table 1 version

Description

Version number of the quantization configuration file

Type

int

Value

1

Command-Line Options

Currently, only version 1 is available.

Recommended Value

1

Required/Optional

This function is optional.

Table 2 batch_num

Description

Batch number in the inference phase of quantization aware training.

Type

int

Value

Greater than 0

Command-Line Options

Defaults to 1. You are advised to keep the calibration dataset size within 50 images. Calculate batch_num based on batch_size as follows:

batch_num x batch_size = Calibration dataset size

batch_size indicates the number of images per batch.

Recommended Value

1

Required/Optional

This function is optional.

Table 3 retrain_enable

Description

QAT enable

Type

bool

Value

true or false

Parameter Description

  • true: on
  • false: off

Recommended

true

Required/Optional

Optional

Table 4 retrain_data_config

Description

Activation quantization configuration

Type

object

Value

-

Command-line options

Includes the following parameters:

  • algo: quantization algorithm select, defaulted to ulq_quantize.
  • clip_max: upper bound of clipping-based quantization, defaulted to be empty.
  • clip_min: lower bound of clipping-based quantization, defaulted to be empty.
  • fixed_min: whether to fix the minimum value of clipping-based quantization to 0. Not included by default.
  • dst_type: INT8 or INT4 quantization bit width select, defaulted to INT8.

Recommended Value

-

Required/Optional

This function is optional.

Table 5 retrain_weight_config

Description

Weight quantization configuration

Type

object

Value

-

Command-Line Options

Includes the following parameters:

  • algo: quantization algorithm select, defaulted to arq_retrain.
  • channel_wise

Recommended Value

-

Required/Optional

This function is optional.

Table 6 algo

Description

Quantization algorithm

Type

object

Value

-

Parameter Description

  • ulq_quantize: ULQ clipping-based quantization algorithm
  • arq_retrain: ARQ quantization algorithm

Recommended Value

Set to ulq_quantize for activation quantization or arq_retrain for weight quantization.

Required/Optional

Optional

Table 7 channel_wise

Description

Whether to use different quantization factors for each channel

Type

bool

Value

true or false

Command-Line Options

  • true: Channels are separately quantized using different quantization factors.
  • false: All channels are quantized altogether using the same quantization factors.

Recommended

true

Required/Optional

Optional

Table 8 fixed_min

Description

Fixed lower bound switch for the activation quantization algorithm

Type

bool

Value

true or false

Command-Line Options

  • true: fixes the lower bound of the activation quantization algorithm at 0.
  • false: does not fix the lower bound of the activation quantization algorithm.

If this parameter is not included, AMCT automatically sets the lower bound of the activation quantization algorithm according to the graph structure.

If this parameter is included, set this parameter for each layer to be quantized as follows: true if the upstream layer is ReLU; false otherwise.

Recommended Value

Do not include this parameter.

Required/Optional

This function is optional.

Table 9 clip_max

Description

Upper bound for the activation quantization algorithm

Specification

float

Value

clip_max>0

Controls the upper bound max based on the distribution of the activation values at different layers. The recommended value range is as follows:

0.3*max~1.7*max

Command-Line Options

If this parameter is included, the upper bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the upper bound is learned using the IFMR algorithm.

Recommended Value

Do not include this parameter.

Required/Optional

This function is optional.

Table 10 clip_min

Description

Lower bound for the activation quantization algorithm

Type

float

Value

clip_min<0

Controls the lower bound min based on the distribution of the activation values at different layers. The recommended value range is as follows:

0.3*min~1.7*min

Command-Line Options

If this parameter is included, the lower bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the lower bound is learned using the IFMR algorithm.

Recommended Value

Do not include this parameter.

Required/Optional

This function is optional.

Table 11 dst_type

Description

Quantization bit width select

Type

string

Value

INT8 (default) or INT4

Command-line options

Set the quantization bit width, INT8 or INT4. Currently, only INT8 quantization is supported.

Recommended Value

-

Required/Optional

Optional