Simplified PTQ Configuration File
Table 1 describes the fields in the calibration_config_tf.proto file. Find the file in /amct_tensorflow/proto/calibration_config_tf.proto under the AMCT installation directory.
Message |
Required |
Type |
Parameter |
Description |
|---|---|---|---|---|
AMCTConfig |
- |
- |
- |
Simplified PTQ configuration of AMCT. |
optional |
uint32 |
batch_num |
Batch count used for quantization. |
|
optional |
bool |
activation_offset |
Whether to quantize activations with offset. It is a global configuration parameter.
|
|
optional |
bool |
joint_quant |
Eltwise joint quantization switch. Defaults to false, indicating that joint quantization is disabled. If true, the network performance may improve but the precision may be compromised. |
|
repeated |
string |
skip_layers |
Layers to skip quantization. |
|
repeated |
string |
skip_layer_types |
Types of layers to skip quantizing. |
|
repeated |
string |
skip_approximation_layers |
Layer to skip calibrated approximation. This version does not support calibrated approximation. |
|
optional |
FakequantPrecisionMode |
fakequant_precision_mode |
Scale_d numerical precision mode of the quant custom operator in the fakequant model.
|
|
optional |
NuqConfig |
nuq_config |
NUQ configuration. |
|
optional |
CalibrationConfig |
common_config |
Common quantization configuration, which is a global parameter. Use this configuration if a layer is not overridden by override_layer_types or override_layer_configs. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
repeated |
OverrideLayerType |
override_layer_types |
Certain types of layers to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
repeated |
OverrideLayer |
override_layer_configs |
Layer to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
optional |
bool |
do_fusion |
BN fusion switch. Defaults to true, indicating BN fusion enabled. |
|
repeated |
string |
skip_fusion_layers |
Layers to skip BN fusion. |
|
repeated |
TensorQuantize |
tensor_quantize |
Whether to perform PTQ on the input tensors of the specified node in the network model to improve data transfer efficiency in inference. Currently, tensor quantization can be performed only on the MaxPool/Add operator. |
|
NuqConfig |
- |
- |
- |
NUQ configuration. |
required |
string |
mapping_file |
JSON file obtained based on the offline model that is converted by the ATC tool from the deployable model after uniform quantization. |
|
optional |
NUQuantize |
nuq_quantize |
NUQ configuration. |
|
OverrideLayerType |
- |
- |
- |
Quantization configuration overriding by layer type. |
required |
string |
layer_type |
Quantizable layer type. |
|
required |
CalibrationConfig |
calibration_config |
Quantization configuration to apply. |
|
OverrideLayer |
- |
- |
- |
Quantization configuration overriding by layer. |
required |
string |
layer_name |
Layer to override. |
|
required |
CalibrationConfig |
calibration_config |
Quantization configuration to apply. |
|
TensorQuantize |
- |
- |
- |
Configuration for input tensors to be post-training quantized. |
required |
string |
layer_name |
Name of the node where input tensors need to be post-training quantized. Currently, only the MaxPool or Add operator is supported. |
|
required |
uint32 |
input_index |
Input index of the node where input tensors need to be post-training quantized. |
|
- |
FMRQuantize |
ifmr_quantize |
Activation quantization algorithm. ifmr_quantize: IFMR algorithm configuration. The IFMR quantization algorithm is used by default. |
|
- |
HFMGQuantize |
hfmg_quantize |
Activation quantization algorithm. hfmg_quantize: HFMG algorithm configuration. |
|
CalibrationConfig |
- |
- |
- |
Calibration-based quantization configuration. |
- |
ARQuantize |
arq_quantize |
Weight quantization algorithm. arq_quantize: ARQ algorithm configuration. |
|
- |
NUQuantize |
nuq_quantize |
Weight quantization algorithm. nuq_quantize: non-uniform quantization algorithm configuration. |
|
- |
FMRQuantize |
ifmr_quantize |
Activation quantization algorithm. ifmr_quantize: IFMR algorithm configuration. |
|
- |
HFMGQuantize |
hfmg_quantize |
Activation quantization algorithm. hfmg_quantize: HFMG algorithm configuration. |
|
- |
DMQBalancer |
dmq_balancer |
Balancing algorithm configuration. dmq_balancer: DMQ Balancer configuration. |
|
ARQuantize |
- |
- |
- |
ARQ algorithm for weight quantization. For details about the algorithm, see ARQ Algorithm. |
optional |
bool |
channel_wise |
Whether to use different quantization factors for each channel.
|
|
optional |
uint32 |
quant_bits |
Weight quantization bit width. The value can be INT6, INT7, or INT8. INT8 quantization is used by default. If this field is set to INT6 or INT7, only Conv2d operators are supported. If quant_bits is set to INT6 or INT7 in common_config, the setting takes effect only for the Conv2d operator. For other operators, the default value INT8 is used. |
|
FMRQuantize |
- |
- |
- |
FMR algorithm for activation quantization. For details about the algorithm, see ifmr: IFMR algorithm for activation quantization. This parameter is mutually exclusive with HFMGQuantize. |
optional |
float |
search_range_start |
Quantization factor search start. |
|
optional |
float |
search_range_end |
Quantization factor search end. |
|
optional |
float |
search_step |
Quantization factor search step. |
|
optional |
float |
max_percentile |
Upper bound for searching for the largest. |
|
optional |
float |
min_percentile |
Lower bound for searching for the smallest. |
|
optional |
bool |
asymmetric |
Whether to perform asymmetric quantization. It is used to select the layer-wise quantization algorithm.
If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs>override_layer_types>common_config>activation_offset |
|
optional |
CalibrationDataType |
dst_type |
Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. Currently, only INT8 quantization is supported. |
|
HFMGQuantize |
- |
- |
- |
HFMG algorithm for activation quantization. For details about the algorithm, see HFMG for Activation Quantization. This parameter is mutually exclusive with FMRQuantize. |
optional |
uint32 |
num_of_bins |
Number of bins (the minimum unit in a histogram). Value range: {1024, 2048, 4096, 8192}. Defaults to 4096. |
|
optional |
bool |
asymmetric |
Whether to perform asymmetric quantization. It is used to select the layer-wise quantization algorithm.
If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs>override_layer_types>common_config>activation_offset |
|
optional |
CalibrationDataType |
dst_type |
Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. Currently, only INT8 quantization is supported. |
|
NUQuantize |
- |
- |
- |
Non-uniform weight quantization algorithm. For details about the algorithm, see nuq_quantize: NUQ algorithm for weight quantization. |
optional |
uint32 |
num_steps |
Number of steps for NUQ. Currently, only 16 and 32 are supported. |
|
optional |
uint32 |
num_of_iteration |
Number of iterations for NUQ optimization. Value range: {0, 1, 2, 3, 4, 5}. The value 0 indicates no iteration. |
|
DMQBalancer |
- |
- |
- |
DMQ Balancer algorithm configuration. For details about the algorithm, see DMQ Balancer Algorithm. |
optional |
float |
migration_strength |
Migration strength, indicating the degree to which the quantization difficulty of activations is migrated to weights. The value range is [0.2, 0.8]. The default value is 0.5. Set the migration strength to a small value if there are many outliers in the activation distribution. |
- The following is an example of the simplified configuration file (quant.cfg) for uniform quantization:
# global quantize parameter batch_num : 2 activation_offset : true joint_quant : false skip_layers : "Opname" skip_layer_types:"Optype" do_fusion: true skip_fusion_layers : "Opname" common_config : { arq_quantize : { channel_wise : true quant_bits : 7 } ifmr_quantize : { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : true } } override_layer_types : { layer_type : "Conv2D" calibration_config : { arq_quantize : { channel_wise : false quant_bits : 6 } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } } } override_layer_configs : { layer_name : "Opname" calibration_config : { arq_quantize : { channel_wise : true } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } } } tensor_quantize { layer_name: "Opname" input_index: 0 ifmr_quantize: { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 min_percentile : 0.999999 asymmetric : false } } tensor_quantize { layer_name: "Opname" input_index: 0 }If the HFMG algorithm is used for activation quantization, replace the lines in bold in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)
# global quantize parameter activation_offset : true batch_num : 1 ... common_config : { hfmg_quantize : { num_of_bins : 4096 asymmetric : false } ... } - The following is an example of the simplified configuration file (quant.cfg) for NUQ:
# global quantize parameter activation_offset : true joint_quant : false batch_num : 2 nuq_config { mapping_file : "nuq_files/resnet50_quantized.json" nuq_quantize : { num_steps : 32 num_of_iteration : 0 } } common_config : { arq_quantize : { channel_wise : true } ifmr_quantize : { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : true } } override_layer_types : { layer_type : "Optype" calibration_config : { arq_quantize : { channel_wise : false } ifmr_quantize : { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } } } tensor_quantize { layer_name: "Opname" input_index: 0 ifmr_quantize: { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 min_percentile : 0.999999 asymmetric : false } } tensor_quantize { layer_name: "Opname" input_index: 0 }If the HFMG algorithm is used for activation quantization, replace the lines in bold in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)
# global quantize parameter activation_offset : true batch_num : 1 ... common_config : { hfmg_quantize : { num_of_bins : 4096 asymmetric : false } ... } - The following is an example of the simplified configuration file dmq_balancer.cfg for activation quantization balance preprocessing:
# global quantize parameter activation_offset : true batch_num : 1 ... common_config : { dmq_balancer : { migration_strength : 0.5 } ... }