Simplified PTQ Configuration File
Table 1 describes the fields in the calibration_config_pytorch.proto file. Find the file in /amct_pytorch/proto/calibration_config_pytorch.proto under the AMCT installation directory.
Parameter |
Required/Optional |
Type |
Field |
Description |
|---|---|---|---|---|
AMCTConfig |
- |
- |
- |
Simplified PTQ configuration of AMCT. |
Optional |
UInt32 |
batch_num |
Batch number for quantization. |
|
Optional |
Boolean |
activation_offset |
Whether to quantize activations with offset. It is a global configuration parameter.
|
|
Repeated |
String |
skip_layers |
Layers to skip quantization. |
|
Repeated |
String |
skip_layer_types |
Types of layers to skip quantization. |
|
Optional |
NuqConfig |
nuq_config |
NUQ configuration. |
|
Optional |
FakequantPrecisionMode |
fakequant_precision_mode |
scale_d value precision mode of the quantization custom operator in the fake-quantized model.
|
|
Optional |
CalibrationConfig |
common_config |
Common quantization configuration, which is a global parameter. Use this configuration if a layer is not overridden by override_layer_types or override_layer_configs. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
Repeated |
OverrideLayerType |
override_layer_types |
Certain types of layers to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
Repeated |
OverrideLayer |
override_layer_configs |
Layer to override the quantization configurations. It is used to determine which layers are to be differentiatedly quantized. By using this parameter, you can perform differentiated quantization on some layers to change the quantization factor search step from 0.01 to 0.02. Parameter priority: override_layer_configs > override_layer_types > common_config |
|
Optional |
Boolean |
do_fusion |
BN fusion switch. Defaults to true, indicating BN fusion enabled. |
|
Repeated |
String |
skip_fusion_layers |
Layers to skip BN fusion. |
|
NuqConfig |
- |
- |
- |
NUQ configuration. |
Required |
String |
mapping_file |
JSON file of the quantized model, which is obtained by converting the deployable model after uniform quantization into an offline model with ATC. |
|
Optional |
NUQuantize |
nuq_quantize |
NUQ configuration. |
|
OverrideLayerType |
- |
- |
- |
Quantization configuration to override by layer type. |
Required |
String |
layer_type |
Quantizable layer type. |
|
Required |
CalibrationConfig |
calibration_config |
Quantization configuration to override. |
|
OverrideLayer |
- |
- |
- |
Quantization configuration to override by layer. |
Required |
String |
layer_name |
Layers to override. |
|
Required |
CalibrationConfig |
calibration_config |
Quantization configuration to override. |
|
CalibrationConfig |
- |
- |
- |
Calibration-based quantization configuration. |
- |
ARQuantize |
arq_quantize |
Weight quantization algorithm configuration. arq_quantize: ARQ algorithm configuration. |
|
- |
NUQuantize |
nuq_quantize |
Weight quantization algorithm configuration. nuq_quantize: NUQ algorithm configuration. |
|
- |
ADAquantize |
ada_quantize |
Weight quantization algorithm configuration. ada_quantize: Adaptive Rounding (AdaRound) algorithm configuration. |
|
- |
FMRQuantize |
ifmr_quantize |
Activation quantization algorithm configuration. ifmr_quantize: IFMR algorithm configuration. |
|
- |
HFMGQuantize |
hfmg_quantize |
Activation quantization algorithm configuration. hfmg_quantize: HFMG algorithm configuration. |
|
- |
DMQBalancer |
dmq_balancer |
Balanced quantization algorithm configuration. dmq_balancer: DMQ Balancer configuration. |
|
ARQuantize |
- |
- |
- |
ARQ algorithm configuration. For details about the algorithm, see ARQ Algorithm. This algorithm cannot be configured together with the ADAquantize and NUQuantize algorithms. If they are configured together, the quantization algorithm configured last in the configuration file is used. |
Optional |
Boolean |
channel_wise |
Whether to use different quantization factors for each channel.
|
|
Optional |
UInt32 |
quant_bits |
Weight quantization bit width. The value can be INT6, INT7, or INT8. INT8 quantization is used by default. This field can be set to INT6 or INT7 only for Conv2d operators. If quant_bits is set to INT6 or INT7 in common_config, the setting takes effect only for Conv2d operators. For other operators, the default value INT8 is used. |
|
ADAquantize |
- |
- |
- |
Adaptive Rounding (AdaRound) algorithm. For details about the algorithm, see ADA Algorithm. For details about the quantization example, see AdaRound quantization calibration. This algorithm cannot be configured together with the ARQuantize and NUQuantize algorithms. If they are configured together, the quantization algorithm configured last in the configuration file is used. The quantization algorithm supports the following activation functions: ReLU, RReLU, LeakyReLU, PReLU, GELU, ReLU6, Sigmoid, and Tanh. The ONNX operators corresponding to the preceding activation functions are Relu, LeakyRelu, LeakyRelu, PRelu, Gelu, Clip, Sigmoid, and Tanh. AMCT obtains quantizable layers (torch.nn.Linear, torch.nn.Conv2d, and torch.nn.ConvTranspose2d) from the quantization configuration (for details about the restrictions, see Uniform Quantization). The quantizable modules are obtained based on the topology sequence in the exported ONNX model. If the preceding activation functions are used after the quantizable module, the quantizable module and activation function are used as a whole. Note: To export the preceding activation functions as ONNX operators, the torch.onnx.export API of torch 2.1 is required, the opset version of the Gelu operator must be v20, and the opset version of other operators must be v17. |
Optional |
UInt32 |
num_iteration |
Number of iterations. The value is greater than or equal to 0. The default value is 10000. |
|
Optional |
Float |
reg_param |
Regularization parameter. The value range is (0,1). The default value is 0.01. |
|
Optional |
Float |
beta_range_start |
Beta start parameter. The default value is 20. beta_range_start > beta_range_end > 0. |
|
Optional |
Float |
beta_range_end |
Beta end parameter. The default value is 2. |
|
Optional |
Float |
warm_start |
Warm-up factor. The value range is (0,1). The default value is 0.2. |
|
Optional |
Boolean |
channel_wise |
Whether to use different quantization factors for each channel.
|
|
DMQBalancer |
- |
- |
- |
DMQ Balancer algorithm configuration. For details about the algorithm, see DMQ Balancer Algorithm. |
Optional |
Float |
migration_strength |
Migration strength, indicating the degree to which the quantization difficulty of activations is migrated to weights. The value range is [0.2, 0.8]. The default value is 0.5. Set the migration strength to a small value if there are many outliers in the activation distribution. |
|
FMRQuantize |
- |
- |
- |
FMR algorithm for activation quantization. For details about the algorithm, see IFMR Algorithm. This algorithm cannot be configured together with the HFMGQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used. |
Optional |
Float |
search_range_start |
Quantization factor search start. |
|
Optional |
Float |
search_range_end |
Quantization factor search end. |
|
Optional |
Float |
search_step |
Quantization factor search step. |
|
Optional |
Float |
max_percentile |
Upper bound for searching for the largest. |
|
Optional |
Float |
min_percentile |
Lower bound for searching for the smallest. |
|
Optional |
Boolean |
asymmetric |
Whether to perform symmetric quantization. It is used to select the layer-wise quantization algorithm.
If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs > override_layer_types > common_config > activation_offset |
|
Optional |
CalibrationDataType |
dst_type |
Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization. |
|
HFMGQuantize |
- |
- |
- |
HFMG algorithm for activation quantization. For details about the algorithm, see HFMG Algorithm. This algorithm cannot be configured together with the FMRQuantize algorithm. If they are configured together, the quantization algorithm configured last in the configuration file is used. |
Optional |
UInt32 |
num_of_bins |
Number of bins (the minimum unit in a histogram). Value range: {1024, 2048, 4096, 8192}. Defaults to 4096. |
|
Optional |
Boolean |
asymmetric |
Whether to perform symmetric quantization. It is used to select the layer-wise quantization algorithm.
If this parameter is set for override_layer_configs, override_layer_types, and common_config, or if the activation_offset parameter is set, the priority is as follows: override_layer_configs > override_layer_types > common_config > activation_offset |
|
Optional |
CalibrationDataType |
dst_type |
Quantization bit width for activation quantization, either INT8 (default) or INT16 quantization. The current version supports only INT8 quantization. |
|
NUQuantize |
- |
- |
- |
NUQ configuration. For details about the algorithm, see NUQ Algorithm. This algorithm cannot be configured together with the ARQuantize and ADAquantize algorithms. If they are configured together, the quantization algorithm configured last in the configuration file is used. |
Optional |
UInt32 |
num_steps |
Number of steps for NUQ. |
|
Optional |
UInt32 |
num_of_iteration |
Number of iterations for NUQ optimization. |
- The following is an example of the simplified configuration file (quant.cfg) for uniform quantization:
# global quantize parameter batch_num : 2 activation_offset : true skip_layers : "Opname" skip_layer_types:"Optype" do_fusion: true skip_fusion_layers : "Opname" common_config : { arq_quantize : { channel_wise : true quant_bits : 7 } ifmr_quantize : { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : true } dmq_balancer : { migration_strength : 0.5 } } override_layer_types : { layer_type : "Conv2d" calibration_config : { arq_quantize : { channel_wise : false quant_bits : 6 } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } dmq_balancer : { migration_strength : 0.5 } } } override_layer_configs : { layer_name : "Opname" calibration_config : { arq_quantize : { channel_wise : true } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } dmq_balancer : { migration_strength : 0.5 } } }If the HFMG algorithm is used for activation quantization, replace the lines in bold in the preceding configuration file with the following ones. (The following configuration file is only an example. Modify it as required.)
# global quantize parameter activation_offset : true batch_num : 2 ... common_config : { hfmg_quantize : { num_of_bins : 4096 asymmetric : false } ... } - The following is an example of the simplified configuration file (ada_round.cfg) for adaptive rounding:
common_config : { ada_quantize : { num_iteration : 10000 warm_start : 0.2 reg_param : 0.01 beta_range_start : 20 beta_range_end : 2 channel_wise : false } } - The following is an example of the simplified configuration file (dmq_balancer.cfg) for activation quantization balance preprocessing:
# global quantize parameter batch_num : 2 activation_offset : true skip_layers : "Opname" skip_layer_types:"Optype" do_fusion: true skip_fusion_layers : "Opname" common_config : { arq_quantize : { channel_wise : true } ifmr_quantize : { search_range_start : 0.7 search_range_end : 1.3 search_step : 0.01 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : true } dmq_balancer : { migration_strength : 0.5 } } override_layer_types : { layer_type : "Optype" calibration_config : { arq_quantize : { channel_wise : false } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } dmq_balancer : { migration_strength : 0.5 } } } override_layer_configs : { layer_name : "Opname" calibration_config : { arq_quantize : { channel_wise : true } ifmr_quantize : { search_range_start : 0.8 search_range_end : 1.2 search_step : 0.02 max_percentile : 0.999999 min_percentile : 0.999999 asymmetric : false } dmq_balancer : { migration_strength : 0.5 } } }