Simplified PTQ Configuration File

Table 1 describes the fields in the calibration_config_tf.proto file. Find the file in /amct_tensorflow/proto/calibration_config_tf.proto under the AMCT installation directory.

**Table 1** calibration_config_tf.proto
Parameter	Required/Optional	Type	Field	Description
AMCTConfig	-	-	-	Simplified PTQ configuration of AMCT.
	Optional	UInt32	batch_num	Batch number for quantization.
	Optional	Boolean	activation_offset	Whether to quantize activations with offset. It is a global configuration parameter. true: with offset. Activations are asymmetrically quantized. false: without offset. Activations are symmetrically quantized.
	Optional	Boolean	joint_quant	Eltwise joint quantization switch. Defaults to false, indicating that joint quantization is disabled. If true, the network performance may improve but the precision may be compromised.
	Repeated	String	skip_layers	Layers to skip quantization.
	Repeated	String	skip_layer_types	Types of layers to skip quantization.
	Repeated	String	skip_approximation_layers	Layer to skip calibrated approximation. This feature applies only to Atlas inference series products.
	Optional	FakequantPrecisionMode	fakequant_precision_mode	scale_d value precision mode of the quantization custom operator in the fake-quantized model. FORCE_FP16_QUANT: The scale_d value is converted to float16 (float32 type). Empty (default), that is, not configured. The value of scale_d is of the float32 precision.
	Optional	NuqConfig	nuq_config	NUQ configuration.
	Optional	CalibrationConfig	common_config	Common quantization configuration, which is a global parameter. Use this configuration if a layer is not overridden by override_layer_types or override_layer_configs. Parameter priority: override_layer_configs > override_layer_types > common_config
	Repeated	OverrideLayerType	override_layer_types	Certain types of layers to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config
	Repeated	OverrideLayer	override_layer_configs	Layer to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config
	Optional	Boolean	do_fusion	BN fusion switch. Defaults to true, indicating BN fusion enabled.
	Repeated	String	skip_fusion_layers	Layers to skip BN fusion.
	Repeated	TensorQuantize	tensor_quantize	Whether to perform PTQ on the input tensors of the specified node in the network model to improve data transfer efficiency in inference. Currently, tensor quantization can be performed only on the MaxPool/Add operator.
NuqConfig	-	-	-	NUQ configuration.
	Required	String	mapping_file	JSON file of the quantized model, which is obtained by converting the deployable model after uniform quantization into an offline model with ATC.
	Optional	NUQuantize	nuq_quantize	NUQ configuration.
OverrideLayerType	-	-	-	Quantization configuration to override by layer type.
	Required	String	layer_type	Quantizable layer type.
	Required	CalibrationConfig	calibration_config	Quantization configuration to override.
OverrideLayer	-	-	-	Quantization configuration to override by layer.
	Required	String	layer_name	Layers to override.
	Required	CalibrationConfig	calibration_config	Quantization configuration to override.
TensorQuantize	-	-	-	Configuration for input tensors to be post-training quantized.
	Required	String	layer_name	Name of the node where input tensors need to be post-training quantized. Currently, only the MaxPool or Add operator is supported.
	Required	UInt32	input_index	Input index of the node where input tensors need to be post-training quantized.
	-	FMRQuantize	ifmr_quantize	Activation quantization algorithm configuration. ifmr_quantize: IFMR algorithm configuration. The IFMR quantization algorithm is used by default.
	-	HFMGQuantize	hfmg_quantize	Activation quantization algorithm configuration. hfmg_quantize: HFMG algorithm configuration.
CalibrationConfig	-	-	-	Calibration-based quantization configuration.
	-	ARQuantize	arq_quantize	Weight quantization algorithm configuration. arq_quantize: ARQ algorithm configuration.
	-	NUQuantize	nuq_quantize	Weight quantization algorithm configuration. nuq_quantize: non-uniform quantization algorithm configuration.
	-	FMRQuantize	ifmr_quantize	Activation quantization algorithm configuration. ifmr_quantize: IFMR algorithm configuration.
	-	HFMGQuantize	hfmg_quantize	Activation quantization algorithm configuration. hfmg_quantize: HFMG algorithm configuration.
	-	DMQBalancer	dmq_balancer	Balancer algorithm configuration. dmq_balancer: DMQ Balancer configuration.
ARQuantize	-	-	-	ARQ algorithm configuration. For details about the algorithm, see ARQ Algorithm. This algorithm cannot be configured together with the NUQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.
	Optional	Boolean	channel_wise	Whether to use different quantization factors for each channel. true: Channels are separately quantized using different quantization factors. false: All channels are quantized altogether using the same quantization factors.
	Optional	UInt32	quant_bits	Weight quantization bit width. The value can be INT6, INT7, or INT8. INT8 quantization is used by default. This field can be set to INT6 or INT7 only for Conv2d operators. If quant_bits is set to INT6 or INT7 in common_config, the setting takes effect only for Conv2d operators. For other operators, the default value INT8 is used.
FMRQuantize	-	-	-	FMR algorithm configuration for activation quantization. For details about the algorithm, see IFMR Algorithm. This algorithm cannot be configured together with the HFMGQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.
	Optional	Float	search_range_start	Quantization factor search start.
	Optional	Float	search_range_end	Quantization factor search end.
	Optional	Float	search_step	Quantization factor search step.
	Optional	Float	max_percentile	Upper bound for searching for the largest.
	Optional	Float	min_percentile	Lower bound for searching for the smallest.
	Optional	Boolean	asymmetric	Whether to perform asymmetric quantization. It is used to select the layer-wise quantization algorithm. true: asymmetric quantization false: symmetric quantization If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs > override_layer_types > common_config > activation_offset
	Optional	CalibrationDataType	dst_type	Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization.
HFMGQuantize	-	-	-	HFMG algorithm for activation quantization. For details about the algorithm, see HFMG Algorithm. This algorithm cannot be configured together with the FMRQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.
	Optional	UInt32	num_of_bins	Number of bins (the minimum unit in a histogram). Value range: {1024, 2048, 4096, 8192}. Defaults to 4096.
	Optional	Boolean	asymmetric	Whether to perform asymmetric quantization. It is used to select the layer-wise quantization algorithm. true: asymmetric quantization false: symmetric quantization If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs > override_layer_types > common_config > activation_offset
	Optional	CalibrationDataType	dst_type	Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization.
NUQuantize	-	-	-	Non-uniform weight quantization algorithm configuration. For details about the algorithm, see NUQ Algorithm. This algorithm cannot be configured together with the ARQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used.
	Optional	UInt32	num_steps	Number of steps for NUQ. Currently, only 16 and 32 are supported.
	Optional	UInt32	num_of_iteration	Number of iterations for NUQ optimization. Value range: {0, 1, 2, 3, 4, 5}. The value 0 indicates no iteration.
DMQBalancer	-	-	-	DMQ Balancer algorithm configuration. For details about the algorithm, see DMQ Balancer Algorithm.
DMQBalancer	Optional	Float	migration_strength	Migration strength, indicating the degree to which the quantization difficulty of activations is migrated to weights. The value range is [0.2, 0.8]. The default value is 0.5. Set the migration strength to a small value if there are many outliers in the activation distribution.

The following is an example of the simplified configuration file (quant.cfg) for uniform quantization:

# global quantize parameter
batch_num : 2
activation_offset : true
joint_quant : false
skip_layers : "Opname"
skip_layer_types:"Optype"
do_fusion: true
skip_fusion_layers : "Opname"
common_config : {
    arq_quantize : {
        channel_wise : true
        quant_bits : 7
    }
    ifmr_quantize : {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        max_percentile : 0.999999
        min_percentile : 0.999999
        asymmetric : true
    }
}
 
override_layer_types : {
    layer_type : "Conv2D"
    calibration_config : {
        arq_quantize : {
            channel_wise : false
            quant_bits : 6
        }
        ifmr_quantize : {
            search_range_start : 0.8
            search_range_end : 1.2
            search_step : 0.02
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
 
override_layer_configs : {
    layer_name : "Opname"
    calibration_config : {
        arq_quantize : {
            channel_wise : true
        }
        ifmr_quantize : {
            search_range_start : 0.8
            search_range_end : 1.2
            search_step : 0.02
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
    ifmr_quantize: {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        min_percentile : 0.999999
       asymmetric : false
       }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
}

If the HFMG algorithm is used for activation quantization, replace the lines in bold in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)

# global quantize parameter
activation_offset : true
batch_num : 1
...
common_config : {
    hfmg_quantize : {
        num_of_bins : 4096
        asymmetric : false
    }
...
}

The following is an example of the simplified configuration file (quant.cfg) for NUQ:

# global quantize parameter
activation_offset : true
joint_quant : false
batch_num : 2
nuq_config {
    mapping_file : "nuq_files/resnet50_quantized.json"
    nuq_quantize : {
        num_steps : 32
        num_of_iteration : 0
    }
}

common_config : {
    arq_quantize : {
        channel_wise : true
    }
    ifmr_quantize : {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        max_percentile : 0.999999
        min_percentile : 0.999999
        asymmetric : true
    }
}

override_layer_types : {
    layer_type : "Optype"
    calibration_config : {
        arq_quantize : {
            channel_wise : false
        }
        ifmr_quantize : {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
    ifmr_quantize: {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        min_percentile : 0.999999
       asymmetric : false
       }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
}

# global quantize parameter
activation_offset : true
batch_num : 1
...
common_config : {
    hfmg_quantize : {
        num_of_bins : 4096
        asymmetric : false
    }
...
}

The following is an example of the simplified configuration file (dmq_balancer.cfg) for activation quantization balance preprocessing:

# global quantize parameter
activation_offset : true
batch_num : 1
...
common_config : {
    dmq_balancer : {
        migration_strength : 0.5
    }
...
}

The following is an example of the calibrated approximation configuration file (approximate.cfg):
```
batch_num: 1
skip_approximate_layers: "Softmax_1"
```

Parent topic: Reference