Simplified Configuration File

To automatically control the quantization process, for example, to control which layers are quantized and which quantization algorithm is used, you can use the .cfg configuration file.

**Table 1** calibration_config.proto description
Message	Required	Type	Field	Description
AMCTConfig	-	-	-	Simplified PTQ configuration of AMCT.
	Optional	bool	activation_offset	Whether to quantize activations with offset. It is a global configuration parameter. With offset: Activations are asymmetrically quantized. Without offset: Activations are symmetrically quantized.
	Optional	bool	joint_quant	Eltwise joint quantization switch. Defaults to false, indicating that joint quantization is disabled. If true, the network performance may improve but the precision may be compromised.
	Repeated	string	skip_layers	Layers to skip quantization.
	Repeated	string	skip_layer_types	Types of layers to skip quantization.
	Optional	int32	version	Version of the simplified configuration file.
	Optional	CalibrationConfig	common_config	Common quantization configuration, which is a global parameter. Use this configuration if a layer is not overridden by override_layer_types or override_layer_configs. Parameter priority: override_layer_configs > override_layer_types > common_config
	Repeated	OverrideLayerType	override_layer_types	Certain types of layers to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config
	Repeated	OverrideLayer	override_layer_configs	Layer to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config
	Optional	bool	do_fusion	BN fusion switch. Defaults to true, indicating BN fusion enabled.
	Repeated	string	skip_fusion_layers	Layers to skip BN fusion.
	Repeated	TensorQuantize	tensor_quantize	Whether to perform PTQ on the input tensors of the specified node in the network model to improve data transfer efficiency in inference. Currently, tensor quantization can be performed only on the MaxPool/Add/eltwise operator.
	Optional	bool	enable_auto_nuq	Due to hardware restrictions, you are not advised to perform NUQ in this version. Otherwise, performance benefits cannot be obtained. Automatic non-uniform weight quantization. Defaults to false, indicating automatic non-uniform weight quantization disabled. If this function is enabled, the quantization layers that have been forcibly configured (by override_layer_configs in the simplified configuration file) are not affected. Only layers that cause performance bottlenecks due to excessive weights in the remaining uniform quantization layers are automatically searched for and quantized to improve the weight compression ratio. Doing so reduces the bandwidth and improves the performance. If a layer is configured weight-only quantization by setting weight_compress_only to true, the layer is not involved when searching in the remaining uniform quantization layers.
OverrideLayerType	Required	string	layer_type	Quantizable layer type.
OverrideLayerType	Required	CalibrationConfig	calibration_config	Quantization configuration to apply.
OverrideLayer	-	-	-	Quantization configuration overriding by layer.
	Required	string	layer_name	Layer to override.
	Required	CalibrationConfig	calibration_config	Quantization configuration to apply.
CalibrationConfig	-	-	-	Calibration-based quantization configuration.
	-	ARQuantize	arq_quantize	Weight quantization algorithm. arq_quantize: ARQ algorithm configuration.
	-	FMRQuantize	ifmr_quantize	Activation quantization algorithm. ifmr_quantize: IFMR algorithm configuration.
	-	NUQuantize	nuq_quantize	Weight quantization algorithm. nuq_quantize: NUQ algorithm configuration.
	Optional	bool	weight_compress_only	Weight quantization only. The data type must be float32 or float16. true: weight quantization only. false: weight and activation quantization. Default value: false When the weight-only quantization feature is used, IFMR activation quantization and NUQ cannot be configured at the same time.
ARQuantize	-	-	-	ARQ algorithm for weight quantization.
	Optional	bool	channel_wise	Whether to use different quantization factors for each channel. true: Channels are separately quantized using different quantization factors. false: All channels are quantized altogether using the same quantization factors.
	Optional	bool	asymmetric	Asymmetric weight quantization. It is used to select the layer-wise quantization algorithm. This parameter is valid only when weight_compress_only is set to true. If weight_compress_only is set to false, asymmetric can only be set to false. true: Asymmetric weight quantization is used (offset is not 0). false: Symmetric weight quantization is used (offset is 0). The default value is false. If this parameter is set for override_layer_configs, override_layer_types, and common_config, the priority is as follows: override_layer_configs > override_layer_types > common_config
	optional	uint32	quant_bits	Weight quantization bit width. The value can be INT6, INT7, or INT8. INT8 quantization is used by default. This field can be set to INT6 or INT7 only for Conv2d operators. If quant_bits is set to INT6 or INT7 in common_config, the setting takes effect only for Conv2d operators. For other operators, the default value INT8 is used. For the ONNX network model, if quant_bits of the Conv operator is set to INT6 or INT7 in override_layer_types, the setting takes effect only when weight dim is 4.
FMRQuantize	-	-	-	FMR algorithm for activation quantization.
	Optional	float	search_range_start	Quantization factor search start.
	Optional	float	search_range_end	Quantization factor search end.
	Optional	float	search_step	Quantization factor search step.
	Optional	float	max_percentile	Upper bound for searching for the largest.
	Optional	float	min_percentile	Lower bound for searching for the smallest.
	Optional	bool	asymmetric	Asymmetric activation quantization. It is used to select the layer-wise quantization algorithm. true: asymmetric quantization false: symmetric quantization If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs > override_layer_types > common_config > activation_offset
	Optional	CalibrationDataType	dst_type	Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization.
TensorQuantize	-	-	-	Configuration for input tensors to be post-training quantized.
	Required	string	layer_name	Name of the node where input tensors need to be post-training quantized. Currently, only the MaxPool operator is supported.
	Required	uint32	input_index	Input index of the node where input tensors need to be post-training quantized.
	-	FMRQuantize	ifmr_quantize	Activation quantization algorithm. ifmr_quantize: IFMR algorithm configuration. The IFMR quantization algorithm is used by default.
NUQuantize	-	-	-	Non-uniform weight quantization algorithm.
	Optional	uint32	num_steps	Number of steps for NUQ. Currently, only 16 and 32 are supported.
	Optional	uint32	num_of_iteration	Number of iterations for NUQ optimization. Value range: {0, 1, 2, 3, 4, 5}. The value 0 indicates no iteration.

The following is an example of the simplified configuration file (quant.cfg) for uniform quantization. Set Optype to an Ascend IR–defined operator type. For details about the mapping, see Layers That Support Quantization and Restrictions.

# global quantize parameter
activation_offset : true
joint_quant : false
enable_auto_nuq : false
version : 1
skip_layers : "Optype"
skip_layer_types:"Optype"
do_fusion: true
skip_fusion_layers : "Optype"
common_config : {
    arq_quantize : {
        channel_wise : true
        quant_bits : 7

    }
    ifmr_quantize : {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        max_percentile : 0.999999
        min_percentile : 0.999999
        asymmetric : true
    }
}
 
override_layer_types : {
    layer_type : "Optype"
    calibration_config : {
        arq_quantize : {
            channel_wise : false
        }
        ifmr_quantize : {
            search_range_start : 0.8
            search_range_end : 1.2
            search_step : 0.02
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
 
override_layer_configs : {
    layer_name : "Opname"
    calibration_config : {
        arq_quantize : {
            channel_wise : true
        }
        ifmr_quantize : {
            search_range_start : 0.8
            search_range_end : 1.2
            search_step : 0.02
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
    ifmr_quantize: {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        min_percentile : 0.999999
        asymmetric : false
       }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
}

The following is an example of the simplified configuration file (quant.cfg) for weight-only quantization:

activation_offset : true
joint_quant : false
version : 1
do_fusion: true
common_config : {
   weight_compress_only : true
    arq_quantize : {
        channel_wise : true
        asymmetric : false
            
    }
}
 
override_layer_types : {
    layer_type : "Optype"
    calibration_config : {
        weight_compress_only : true
        arq_quantize : {
            channel_wise : true
            asymmetric : true
            quant_bits : 6
        }
    }
}
 
override_layer_configs : {
    layer_name : "Opname"
    calibration_config : {
        weight_compress_only : true
        arq_quantize : {
            channel_wise : true
            asymmetric : true
        }
    }
}

The following is an example of the simplified configuration file (quant.cfg) for NUQ:

# global quantize parameter
activation_offset : true
joint_quant : false
enable_auto_nuq : false

common_config : {
    arq_quantize : {
        channel_wise : true
    }
    ifmr_quantize : {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        max_percentile : 0.999999
        min_percentile : 0.999999
        asymmetric : true
    }
}

override_layer_types : {
    layer_type : "Optype"
    calibration_config : {
        arq_quantize : {
            channel_wise : false
        }
        ifmr_quantize : {
            search_range_start : 0.7
            search_range_end : 1.3
            search_step : 0.01
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
override_layer_configs : {
    layer_name : "Opname"
    calibration_config : {
        nuq_quantize : {
            num_steps : 32
            num_of_iteration : 1
        }
        ifmr_quantize : {
            search_range_start : 0.8
            search_range_end : 1.2
            search_step : 0.02
            max_percentile : 0.999999
            min_percentile : 0.999999
            asymmetric : false
        }
    }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
    ifmr_quantize: {
        search_range_start : 0.7
        search_range_end : 1.3
        search_step : 0.01
        min_percentile : 0.999999
        asymmetric : false
    }
}
tensor_quantize {
    layer_name: "Opname"
    input_index: 0
}

Parent topic: Appendixes