Simplified PTQ Configuration File

Table 1 describes the fields in the calibration_config_onnx.proto file. Find the file in /amct_onnx/proto/calibration_config_onnx.proto under the AMCT installation directory.

Table 1 calibration_config_onnx.proto

Parameter

Required/Optional

Type

Field

Description

AMCTConfig

-

-

-

Simplified PTQ configuration of AMCT.

Optional

UInt32

batch_num

Batch number for quantization.

Optional

Boolean

activation_offset

Whether to quantize activations with offset. It is a global configuration parameter.

  • true: with offset. Activations are asymmetrically quantized.
  • false: without offset. Activations are symmetrically quantized.

Repeated

String

skip_layers

Layers to skip quantization.

Repeated

String

skip_layer_types

Types of layers to skip quantization.

Optional

NuqConfig

nuq_config

NUQ configuration.

Optional

Boolean

joint_quant

Eltwise joint quantization switch. Defaults to false, indicating that joint quantization is disabled.

If true, the network performance may improve but the precision may be compromised.

Optional

FakequantPrecisionMode

fakequant_precision_mode

scale_d value precision mode of the quantization custom operator in the fake-quantized model.

  • FORCE_FP16_QUANT: The scale_d value precision is converted to fp16.
  • Empty (default), that is, not configured. The fp32 precision is retained for the scale_d parameter.

Optional

CalibrationConfig

common_config

Common quantization configuration, which is a global parameter. Use this configuration if a layer is not overridden by override_layer_types or override_layer_configs.

Parameter priority: override_layer_configs > override_layer_types > common_config

Repeated

OverrideLayerType

override_layer_types

Certain types of layers to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized.

By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02.

Parameter priority: override_layer_configs > override_layer_types > common_config

Repeated

OverrideLayer

override_layer_configs

Layer to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized.

By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02.

Parameter priority: override_layer_configs > override_layer_types > common_config

Optional

Boolean

do_fusion

BN fusion switch. Defaults to true, indicating BN fusion enabled.

Repeated

String

skip_fusion_layers

Layers to skip BN fusion.

Repeated

TensorQuantize

tensor_quantize

Whether to perform PTQ on the input tensors of the specified node in the network model to improve data transfer efficiency in inference.

Currently, tensor quantization can be performed only on the MaxPool/Add operator.

NuqConfig

-

-

-

NUQ configuration.

Required

String

mapping_file

JSON file of the quantized model, which is obtained by converting the deployable model after uniform quantization into an offline model with ATC.

Optional

NUQuantize

nuq_quantize

NUQ configuration.

OverrideLayerType

-

-

-

Quantization configuration to override by layer type.

Required

String

layer_type

Quantizable layer type.

Required

CalibrationConfig

calibration_config

Quantization configuration to override.

OverrideLayer

-

-

-

Quantization configuration to override by layer.

Required

String

layer_name

Layers to override.

Required

CalibrationConfig

calibration_config

Quantization configuration to override.

TensorQuantize

-

-

-

Configuration for input tensors to be post-training quantized.

Required

String

layer_name

Name of the node where input tensors need to be post-training quantized. Currently, only the MaxPool or Add operator is supported.

Required

UInt32

input_index

Input index of the node where input tensors need to be post-training quantized.

-

FMRQuantize

ifmr_quantize

Activation quantization algorithm configuration.

ifmr_quantize: IFMR algorithm configuration. The IFMR quantization algorithm is used by default.

-

HFMGQuantize

hfmg_quantize

Activation quantization algorithm configuration.

hfmg_quantize: HFMG algorithm configuration.

CalibrationConfig

-

-

-

Calibration-based quantization configuration.

-

ARQuantize

arq_quantize

Weight quantization algorithm configuration.

arq_quantize: ARQ algorithm configuration.

-

NUQuantize

nuq_quantize

Weight quantization algorithm configuration.

nuq_quantize: NUQ algorithm configuration.

-

FMRQuantize

ifmr_quantize

Activation quantization algorithm configuration.

ifmr_quantize: IFMR algorithm configuration.

-

HFMGQuantize

hfmg_quantize

Activation quantization algorithm configuration.

hfmg_quantize: HFMG algorithm configuration.

-

DMQBalancer

dmq_balancer

Balanced quantization algorithm configuration.

dmq_balancer: DMQ Balancer configuration.

ARQuantize

-

-

-

ARQ algorithm configuration. For details about the algorithm, see ARQ Algorithm.

This algorithm cannot be configured together with the NUQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.

Optional

Boolean

channel_wise

Whether to use different quantization factors for each channel.

  • true: Channels are separately quantized using different quantization factors.
  • false: All channels are quantized altogether using the same quantization factors.

Optional

UInt32

quant_bits

Weight quantization bit width. The value can be INT6, INT7, or INT8.

INT8 quantization is used by default.

If this field is set to INT6 or INT7, only the Conv2d operator (Conv operator whose kernel shape is 4) is supported.

If quant_bits is set to INT6 or INT7 in common_config, the setting takes effect only for the Conv2d operator, and INT8 is used for other operators. If quant_bits is set to INT6 or INT7 for Conv operators in override_layer_types, the setting takes effect only when the weight dimension is 4.

FMRQuantize

-

-

-

FMR quantization algorithm configuration. For details about the algorithm, see IFMR Algorithm.

This algorithm cannot be configured together with the HFMGQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.

Optional

Float

search_range_start

Quantization factor search start.

Optional

Float

search_range_end

Quantization factor search end.

Optional

Float

search_step

Quantization factor search step.

Optional

Float

max_percentile

Upper bound for searching for the largest.

Optional

Float

min_percentile

Lower bound for searching for the smallest.

Optional

Boolean

asymmetric

Whether to perform symmetric quantization. It is used to select the layer-wise quantization algorithm.

  • true: asymmetric quantization
  • false: symmetric quantization

If this parameter is set for override_layer_configs, override_layer_types, and common_config, or

if the activation_offset parameter is set, the priority is as follows:

override_layer_configs > override_layer_types > common_config > activation_offset

Optional

CalibrationDataType

dst_type

Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization.

HFMGQuantize

-

-

-

HFMG algorithm for activation quantization. For details about the algorithm, see HFMG Algorithm.

This algorithm cannot be configured together with the FMRQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.

Optional

UInt32

num_of_bins

Number of bins (the minimum unit in a histogram). Value range: {1024, 2048, 4096, 8192}.

Defaults to 4096.

Optional

Boolean

asymmetric

Whether to perform symmetric quantization. It is used to select the layer-wise quantization algorithm.

  • true: asymmetric quantization
  • false: symmetric quantization

If this parameter is set for override_layer_configs, override_layer_types, and common_config, or

if the activation_offset parameter is set, the priority is as follows:

override_layer_configs > override_layer_types > common_config > activation_offset

Optional

CalibrationDataType

dst_type

Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization.

NUQuantize

-

-

-

NUQ algorithm configuration. For details about the algorithm, see NUQ Algorithm.

This algorithm cannot be configured together with the ARQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.

Optional

UInt32

num_steps

Number of steps for NUQ.

Optional

UInt32

num_of_iteration

Number of iterations for NUQ optimization.

DMQBalancer

-

-

-

DMQ Balancer algorithm configuration. For details about the algorithm, see DMQ Balancer Algorithm.

Optional

Float

migration_strength

Migration strength, indicating the degree to which the quantization difficulty of activations is migrated to weights. The value range is [0.2, 0.8]. The default value is 0.5. Set the migration strength to a small value if there are many outliers in the activation distribution.

  • The following is an example of the simplified configuration file (quant.cfg) for uniform quantization:
    # global quantize parameter
    batch_num : 2
    activation_offset : true
    joint_quant : false
    skip_layers : "Opname"
    skip_layer_types:"Optype"
    do_fusion: true
    skip_fusion_layers : "Optype"
    common_config : {
        arq_quantize : {
            channel_wise : true
            quant_bits : 7
        }
        ifmr_quantize : {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            max_percentile : 0.999999
            min_percentile : 0.999999
             asymmetric : true
        }
    }
     
    override_layer_types : {
        layer_type : "Conv2d"
        calibration_config : {
            arq_quantize : {
                channel_wise : false        
                quant_bits : 6
    
            }
            ifmr_quantize : {
                search_range_start : 0.8
                search_range_end : 1.2
                search_step : 0.02
                max_percentile : 0.999999
                min_percentile : 0.999999
                 asymmetric : false
            }
        }
    }
     
    override_layer_configs : {
        layer_name : "Opname"
        calibration_config : {
            arq_quantize : {
                channel_wise : true
            }
            ifmr_quantize : {
                search_range_start : 0.8
                search_range_end : 1.2
                search_step : 0.02
                max_percentile : 0.999999
                min_percentile : 0.999999
                 asymmetric : false
            }
        }
    }
    tensor_quantize {
        layer_name: "Opname"
        input_index: 0
        ifmr_quantize: {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            min_percentile : 0.999999
           asymmetric : false
           }
    }
    tensor_quantize {
        layer_name: "Opname"
        input_index: 0
    }

    If the HFMG algorithm is used for activation quantization, replace the lines in bold in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)

    # global quantize parameter
    activation_offset : true
    batch_num : 1
    ...
    common_config : {
        hfmg_quantize : {
            num_of_bins : 4096
             asymmetric : false
        }
    ...
    }
  • The following is an example of the simplified configuration file (quant.cfg) for NUQ:
    # global quantize parameter
    activation_offset : true
    joint_quant : false
    batch_num : 2
    nuq_config {
        mapping_file : "./nuq_files/resnet101_quantized.json"
        nuq_quantize : {
            num_steps : 32
            num_of_iteration : 0
        }
    }
    
    common_config : {
        arq_quantize : {
            channel_wise : true
        }
        ifmr_quantize : {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            max_percentile : 0.999999
            min_percentile : 0.999999
             asymmetric : true
        }
    }
    
    override_layer_types : {
        layer_type : "Optype"
        calibration_config : {
            arq_quantize : {
                channel_wise : false
            }
            ifmr_quantize : {
                search_range_start : 0.7
                search_range_end : 1.3
                search_step : 0.01
                max_percentile : 0.999999
                min_percentile : 0.999999
                 asymmetric : false
            }
        }
    }
    tensor_quantize {
        layer_name: "Opname"
        input_index: 0
        ifmr_quantize: {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            min_percentile : 0.999999
           asymmetric : false
           }
    }
    tensor_quantize {
        layer_name: "Opname"
        input_index: 0
    }

    If the HFMG algorithm is used for activation quantization, replace the lines in bold and italics in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)

    # global quantize parameter
    activation_offset : true
    batch_num : 1
    ...
    common_config : {
        hfmg_quantize : {
            num_of_bins : 4096
             asymmetric : false
        }
    ...
    }
  • The following is an example of the simplified configuration file (dmq_balancer.cfg) for activation quantization balance preprocessing:
    # global quantize parameter
    batch_num : 2
    activation_offset : true
    joint_quant : false
    skip_layers : "Opname"
    skip_layer_types:"Optype"
    do_fusion: true
    skip_fusion_layers : "Opname"
    common_config : {
        arq_quantize : {
            channel_wise : true
        }
        ifmr_quantize : {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : true
        }
         dmq_balancer : {
            migration_strength : 0.5
        }
    }
     
    override_layer_types : {
        layer_type : "Optype"
        calibration_config : {
            arq_quantize : {
                channel_wise : false
            }
            ifmr_quantize : {
                search_range_start : 0.8
                search_range_end : 1.2
                search_step : 0.02
                max_percentile : 0.999999
                min_percentile : 0.999999
                asymmetric : false
            }
           dmq_balancer : {
               migration_strength : 0.5
           }
        }
    }
     
    override_layer_configs : {
        layer_name : "Opname"
        calibration_config : {
            arq_quantize : {
                channel_wise : true
            }
            ifmr_quantize : {
                search_range_start : 0.8
                search_range_end : 1.2
                search_step : 0.02
                max_percentile : 0.999999
                min_percentile : 0.999999
                asymmetric : false
            }
           dmq_balancer : {
               migration_strength : 0.5
           }
        }
    }