Manual Tuning

If the PTQ accuracy does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.

Tuning Workflow

If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_config API call is not as expected, you can tune the configuration parameters until the accuracy meets your requirement. The workflow for manually tuning the parameters in the PTQ configuration file config.json goes through the following three phases:

  1. Tune the amount of data used for calibration.
  2. Skip quantizing certain layers.
  3. Tune the quantization algorithm and parameters.

Specifically,

  1. Run quantization based on the initial config.json file generated by the create_quant_config API call. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 2.
  2. Tweak the value of batch_num to tune the amount of data used for calibration.

    batch_num controls the batch number for quantization. Tune it based on the batch size and the dataset size. Generally:

    Generally, a larger value of batch_num indicates more data samples used for quantization and a smaller accuracy drop of the quantized model. However, excessive data does not necessarily improve accuracy, but certainly consumes more memory and reduces the quantization speed, hence resulting in insufficient memory, video RAM, and thread resources. An optimal tradeoff is achieved when the product of batch_num and batch_size (the number of images per batch) is 16 or 32.

  3. Run quantization based on the new configuration generated in 2. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 4.
  4. Tweak the value of quant_enable to skip quantizing certain layers.

    quant_enable is the quantization switch of a specified layer. The value false indicates that the layer will be skipped during quantization; true, otherwise. Removing the layer configuration can also skip the layer.

    Quantizing a model can have a negative effect on accuracy. Layers sensitive to quantization will suffer from remarkable error increases once quantized and therefore should be left unquantized. Spot these layers as follows:

    1. In a model, the input layer, output layer, and layers with especially fewer parameters are likely to be quantization-sensitive.
    2. Use the Model Accuracy Analyzer to compare the output errors between the source model and the quantized model layer-wise (a cosine similarity of at least 0.99, for example) to locate the layers that reduce accuracy the most.
  5. Run quantization based on the new configuration generated in 4. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 6.
  6. Tweak the values of activation_quant_params and weight_quant_params to tune the quantization algorithms and parameters.

    For details about the algorithm parameters, see Command-Line Options. For details about the algorithm, see PTQ Algorithms.

  7. Run quantization based on the new configuration generated in 6. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for quantization and the quantization configuration should be removed.
Figure 1 Configuration tuning workflow

Quantization Configuration File

If the inference accuracy of the config.json quantization configuration file generated by the create_quant_config call does not meet the requirements, you need to tune the config.json file until the accuracy is as expected. The following is an example of the file content.

  • Uniform quantization configuration file (with ifmr: IFMR algorithm for activation quantization used for activation quantization)
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    {
        "version":1,
        "batch_num":2,
        "activation_offset":true,
        "joint_quant":false,
        "do_fusion":true,
        "skip_fusion_layers":[],
        "tensor_quantize":[
          {
             "layer_name": "maxpool_ld_default",
             "input_index":0,
             "activation_quant_params":{
                 "num_bits":8,
                 "max_percentile":0.999999,
                 "min_percentile":0.999999,
                 "search_range":[
                     0.7,
                    1.3
                 ],
                 "search_step":0.01,
                 "act_algo":"ifmr",
                 "asymmetric":false
              }
           }
        ],
        "layer_name1":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                "max_percentile":0.999999,
                "min_percentile":0.999999,
                "search_range":[
                    0.7,
                    1.3
                ],
                "search_step":0.01,
                "act_algo":"ifmr",
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":true
            }
        },
        "layer_name2":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                "max_percentile":0.999999,
                "min_percentile":0.999999,
                "search_range":[
                    0.7,
                    1.3
                ],
                "search_step":0.01,
                "act_algo":"ifmr",
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":false
            }
        }
    }
    
  • Uniform quantization configuration file (with HFMG for Activation Quantization used for activation quantization)
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    {
        "version":1,
        "batch_num":2,
        "activation_offset":true,
        "do_fusion":true,
        "skip_fusion_layers":[],
        "tensor_quantize":[
          {
             "layer_name": "maxpool_ld_default",
             "input_index":0,
             "activation_quant_params":{
                 "num_bits":8,
                 "max_percentile":0.999999,
                 "min_percentile":0.999999,
                 "search_range":[
                     0.7,
                    1.3
                 ],
                 "search_step":0.01
                 "act_algo":"hfmg"
                 "asymmetric":false
              }
           }
        },
        "layer_name1":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                "act_algo":"hfmg",
                "num_of_bins":4096
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":true
            }
        }
    }
    

Command-Line Options

The following tables describe the parameters in the configuration file.

Table 1 version

Description

Version number of the quantization configuration file

Type

int

Value

1

Command-Line Options

Currently, only version 1 is available.

Recommended Value

1

Required/Optional

This function is optional.

Table 2 batch_num

Description

Batch number for quantization

"Type"

int

The options are as follows:

Greater than 0

Command-Line Options

Defaults to 1. You are advised to keep the calibration dataset size within 50 images. Calculate batch_num based on batch_size as follows:

batch_num x batch_size = Calibration dataset size

batch_size indicates the number of images per batch.

Recommended Value

1

Required/Optional

This function is optional.

Table 3 activation_offset

Description

Symmetric or asymmetric mode select for activation quantization. It is a global configuration parameter.

The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.

Type

bool

Value

true or false

Command-Line Options

  • true: asymmetric quantization
  • false: symmetric quantization

Recommended

true

Required/Optional

Optional

Table 4 joint_quant

Description

Eltwise joint quantization switch

Type

bool

Value

true or false

Command-line options

  • true: on
  • false: off

Recommended Value

false

Required/Optional

Optional

Table 5 do_fusion

Description

Fusion switch

Type

bool

Value

true or false

Command-Line Options

  • true: on
  • false: off

Currently, only Conv+BN fusion is supported.

Recommended

true

Required/Optional

This function is optional.

Table 6 skip_fusion_layers

Description

Layers to skip fusion

Type

string

Value

Must be names of fusible layers. Currently, only Conv+BN fusion is supported.

Command-Line Options

Sets the layers to skip fusion.

Recommended Value

-

Required/Optional

Optional

Table 7 layer_config

Description

Quantization configuration of a network layer

Specification

object

Value

-

Parameters description:

Includes the following parameters:

  • quant_enable
  • activation_quant_params
  • weight_quant_params

Recommended Configuration

-

Required/Optional

This function is optional.

Table 8 quant_enable

Description

Quantization enable

"Type"

bool

The options are as follows:

true or false

Command-Line Options

  • true: on
  • false: off

Recommended Configuration

true

Required/Optional

This function is optional.

Table 9 dmq_balancer_param

Description

Migration strength of the DMQ Balancer

"Type"

float

The options are as follows:

[0.2, 0.8]

Command-Line Options

Degree to which the quantization difficulty of activations is migrated to weights. Set the migration strength to a small value if there are many outliers in the activation distribution.

Recommended Configuration

0.5

Required/Optional

This function is optional.

Table 10 activation_quant_params

Description

Activation quantization parameters

"Type"

object

The options are as follows:

-

Parameters description:

Includes the following parameters. (Beware that IFMR algorithm parameters are mutually exclusive with HFMG ones at the same layer.)

  • IFMR algorithm parameters:
    • max_percentile
    • min_percentile
    • search_range
    • search_step
    • act_algo
    • num_bits
    • asymmetric
  • HFMG algorithm parameters:
    • act_algo
    • num_of_bins
    • num_bits
    • asymmetric

Recommended Configuration

-

Required/Optional

This function is optional.

Table 11 weight_quant_params

Description

Weight quantization parameters

Type

object

The options are as follows:

-

Parameters description:

  • Includes the following parameters in uniform quantization:
    • num_bits
    • wts_algo
    • channel_wise

Recommended Configuration

-

Required/Optional

This function is optional.

Table 12 num_bits

Description

Quantization bit width

"Type"

int

The options are as follows:

8 or 16

Parameters description:

Must be 8, indicating INT8 quantization bit width.

Recommended Configuration

-

Mandatory or Optional

Required

Table 13 act_algo

Description

Activation quantization algorithm

"Type"

string

The options are as follows:

ifmr or hfmg

Parameter Description

ifmr: IFMR algorithm for activation quantization

hfmg: HFMG algorithm for activation quantization

Recommended Configuration

-

Required/Optional

This function is optional.

Table 14 asymmetric

Description

Symmetric quantization or asymmetric quantization select for activation quantization. It is used to select the layer-wise quantization algorithm.

The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.

Type

bool

Value

true or false

Command-Line Options

  • true: asymmetric quantization
  • false: symmetric quantization

Recommended

true

Required/Optional

This function is optional.

Table 15 max_percentile

Description

Upper bound for searching for the largest of the IFMR activation quantization algorithm.

"Type"

float

The options are as follows:

(0.5, 1]

Command-line options

For example, given 100 numeric values in descending order, the upper bound 1.0 indicates that the value indexed 0 (100 – 100 x 1.0) is considered as the largest.

A larger value indicates that the upper bound for clipping-based quantization is closer to the maximum value of the data to be quantized.

Recommended Configuration

0.999999

Required/Optional

This function is optional.

Table 16 min_percentile

Description

Lower bound for searching for the smallest of the IFMR activation quantization algorithm.

"Type"

float

The options are as follows:

(0.5, 1]

Command-Line Options

For example, given 100 numeric values in ascending order, the lower bound 1.0 indicates that the value indexed 0 (100 – 100 x 1.0) is considered as the smallest.

A larger value indicates that the lower bound for clipping-based quantization is closer to the minimum value of the data to be quantized.

Recommended Configuration

0.999999

Required/Optional

This function is optional.

Table 17 search_range

Description

Quantization factor search range ([search_range_start, search_range_end]) of the IFMR algorithm for activation quantization

"Type"

A list of two floats

The options are as follows:

0<search_range_start<search_range_end

Parameters description:

Sets the quantization factor search range.

  • search_range_start: search start
  • search_range_end: search end

Recommended Configuration

[0.7,1.3]

Required/Optional

This function is optional.

Table 18 search_step

Description

Quantization factor search step of the IFMR algorithm for activation quantization

Type

float

Value

(0, (search_range_end-search_range_start)]

Command-Line Options

Set the fluctuation step of the upper bound for clipping-based quantization. A smaller value indicates a smaller quantization factor search step.

Recommended Configuration

0.01

Required/Optional

This function is optional.

Table 19 num_of_bins

Description

Number of bins (the minimum unit in a histogram) of the HFMG algorithm for activation quantization

"Type"

unsigned int

The options are as follows:

{1024, 2048, 4096, 8192}

Parameters description:

A larger value of num_of_bins leads to better distribution fitting of the histogram and better quantization effect, but it also incurs longer PTQ time.

Recommended Configuration

4096

Mandatory or Optional

Optional for quantization using the HFMG algorithm.

Table 20 wts_algo

Description

Weight quantization algorithm

"Type"

string

The options are as follows:

arq_quantize

Parameters description:

arq_quantize: ARQ algorithm for weight quantization

Recommended Configuration

-

Required/Optional

Optional

Table 21 channel_wise

Description

Whether to use different quantization factors for each channel in the arq_quantize algorithm.

"Type"

bool

The options are as follows:

true or false

Parameters description:

  • true: Channels are separately quantized using different quantization factors.
  • false: All channels are quantized altogether using the same quantization factors.

Recommended Configuration

true

Required/Optional

This function is optional.

Table 22 tensor_quantize

Description

Quantization configuration of a network layer

Type

object

Value

-

Parameters description:

Includes the following parameters:

  • layer_name
  • input_index
  • activation_quant_params

Recommended Configuration

-

Required/Optional

This function is optional.

Table 23 layer_name

Description

Name of the node where input tensors need to be post-training quantized

"Type"

string

The options are as follows:

-

Parameters description:

Currently, only the MaxPool operator is supported.

Recommended Configuration

-

Mandatory or Optional

Required

Table 24 input_index

Description

Input index of the node where input tensors need to be post-training quantized.

"Type"

uint32

The options are as follows:

-

Parameters description:

Indicates the input index of a node.

Recommended Configuration

-

Mandatory or Optional

Required