Simplified QAT Configuration File
Table 1 describes the fields in the retrain_config_pytorch.proto file. Find the file in /amct_pytorch/proto/retrain_config_pytorch.proto under the AMCT installation directory.
This file can be used to configure the simplified configuration file for QAT, simplified configuration file for sparsity, and simplified configuration file for compression combination. You can configure the file based on the scenario.
Parameter |
Required/Optional |
Type |
Field |
Description |
|---|---|---|---|---|
AMCTRetrainConfig |
- |
- |
- |
Simplified QAT configuration of AMCT. |
Repeated |
String |
skip_layers |
Layers to skip compression by layer name. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers. If both skip_layers and quant_skip_layers or both skip_layers and regular_prune_skip_layers are configured, the union set is used. |
|
Repeated |
String |
skip_layer_types |
Layers to skip compression by layer type. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types. If both skip_layer_types and quant_skip_types or both skip_layer_types and regular_prune_skip_types are configured, the union set of the two parameters is used. |
|
Repeated |
RetrainOverrideLayer |
override_layer_configs |
Layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. The current version supports only INT8 quantization. Parameter priority:
|
|
Repeated |
RetrainOverrideLayerType |
override_layer_types |
Types of layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. The current version supports only INT8 quantization. Parameter priority:
|
|
Optional |
FakequantPrecisionMode |
fakequant_precision_mode |
scale_d value precision mode of the quantization custom operator in the fake-quantized model.
|
|
Optional |
UInt32 |
batch_num |
Batch number for quantization. |
|
Required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Quantization configuration parameter for QAT data. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config |
|
Required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization parameter for QAT. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config |
|
Repeated |
String |
quant_skip_layers |
Layers to skip quantization. Applicable to quantization. If both skip_layers and quant_skip_layers are set, their union is used. |
|
Repeated |
String |
quant_skip_types |
Types of layers to skip quantization. Applicable to quantization. If both skip_layer_types and quant_skip_types are set, their union is used. |
|
Optional |
PruneConfig |
prune_config |
Sparsity configuration. It is a global sparsity configuration parameter. Parameter priority: override_layer_configs > override_layer_types > prune_config |
|
Repeated |
String |
regular_prune_skip_layers |
Layers to skip structured sparsity. Applicable to sparsity. If both skip_layers and regular_prune_skip_layers are set, their union is used. |
|
Repeated |
String |
regular_prune_skip_types |
Types of layers to skip structured sparsity. Applicable to sparsity. If both skip_layer_types and regular_prune_skip_types are set, their union is used. |
|
RetrainDataQuantConfig |
- |
- |
- |
Activation quantization configuration for QAT. |
- |
ActULQquantize |
ulq_quantize |
Activation quantization algorithm. Currently, only ULQ is supported. |
|
ActULQquantize |
- |
- |
- |
ULQ algorithm for activation quantization. For details about the algorithm, see ULQ Algorithm. |
Optional |
DataType |
dst_type |
Activation quantization bit width. The options are as follows. The current version supports only INT8 quantization.
|
|
Optional |
ClipMaxMin |
clip_max_min |
Initial upper and lower bounds. IFMR is used for initialization by default. |
|
Optional |
Boolean |
fixed_min |
Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms. |
|
ClipMaxMin |
- |
- |
- |
Initial upper and lower bounds. This parameter is used to compute the scale factor. Training and tuning are performed based on the initial value. It is recommended that the initial value be the same as the upper and lower bounds of the actual inference data. Otherwise, the accuracy may be low after quantization. |
Required |
Float |
clip_max |
Initial upper bound. |
|
Required |
Float |
clip_min |
Initial lower bound. |
|
RetrainWeightQuantConfig |
- |
- |
- |
Weight quantization configuration for QAT. |
- |
ARQRetrain |
arq_retrain |
ARQ algorithm. |
|
- |
WtsULQRetrain |
ulq_retrain |
ULQ algorithm for weight quantization. |
|
ARQRetrain |
- |
- |
- |
ARQ algorithm configuration. For details about the algorithm, see ARQ Algorithm. |
Optional |
DataType |
dst_type |
Bit width select of INT8 or INT4 quantization, defaulted to INT8. The current version supports only INT8 quantization. |
|
Optional |
Boolean |
channel_wise |
Channel-wise ARQ enable. |
|
WtsULQRetrain |
- |
- |
- |
ULQ algorithm for weight quantization. For details about the algorithm, see ULQ Algorithm. |
Optional |
DataType |
dst_type |
Bit width select of INT8 or INT4 quantization, defaulted to INT8. The current version supports only INT8 quantization. |
|
Optional |
Boolean |
channel_wise |
Channel-wise ULQ enable. |
|
RetrainOverrideLayer |
- |
- |
- |
Layer overriding configuration. |
Required |
String |
layer_name |
Layer name. |
|
Required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Activation quantization configuration to override. |
|
Required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization configuration to override. |
|
Optional |
PruneConfig |
prune_config |
Sparsity configuration to override. |
|
RetrainOverrideLayerType |
- |
- |
- |
Types of layers to override. |
Required |
String |
layer_type |
Layer type. |
|
Required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Activation quantization configuration to override. |
|
Required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization configuration to override. |
|
Optional |
PruneConfig |
prune_config |
Sparsity configuration to override. |
|
PruneConfig |
- |
- |
- |
Sparsity configuration. |
- |
FilterPruner |
filter_pruner |
Filter-level (output channel-level) sparsity configuration. |
|
- |
NOutOfMPruner |
n_out_of_m_pruner |
Configuration of 2:4 structured sparsity. Due to hardware restrictions, the |
|
FilterPruner |
- |
- |
- |
Filter-level sparsity configuration. |
- |
BalancedL2NormFilterPruner |
balanced_l2_norm_filter_prune |
BalancedL2NormFilterPruner algorithm (BCP). For details about the algorithm, see Manual Channel Pruning Algorithm. |
|
BalancedL2NormFilterPruner |
- |
- |
- |
BalancedL2NormFilterPruner algorithm configuration. |
Required |
Float |
prune_ratio |
Sparsity ratio, that is, the ratio of the number of sparsified filters to the total number of filters. The recommended value is 0.2, indicating that 20% of the filters will be sparsified. |
|
Optional |
Boolean |
ascend_optimized |
Whether to perform adaptation to Ascend platforms. If the sparsified model is to be deployed on the Ascend AI Processor, you are advised to set this parameter to true. |
|
NOutOfMPruner |
- |
- |
- |
Configuration of 2:4 structured sparsity. |
- |
L1SelectivePruner |
l1_selective_prune |
L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm. |
|
L1SelectivePruner |
- |
- |
- |
Configuration of the L1SelectivePrune algorithm. |
Optional |
NOutOfMType |
n_out_of_m_type |
Currently, only M4N2 is supported. That is, two weights in every four consecutive weights are reserved. |
|
Optional |
UInt32 |
update_freq |
Interval for updating 2:4 sparsity. If update_freq is set to 0, the selections of 2:4 sparsity are updated only in the first batch. If update_freq is set to 2, the selections of 2:4 sparsity are updated in every two batches. The rest may be deduced by analogy. The default value is 0. |
- The following is an example of the simplified configuration file (quant.cfg) for QAT:
# global quantize parameter retrain_data_quant_config: { ulq_quantize: { clip_max_min: { clip_max: 6.0 clip_min: -6.0 } fixed_min: true dst_type: INT8 } } retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } skip_layers: "Opname" skip_layer_types: "Optype" override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } } override_layer_configs : { layer_name: "Opname" retrain_data_quant_config: { ulq_quantize: { clip_max_min: { clip_max: 3.0 clip_min: -3.0 } dst_type: INT8 } } } - The following is an example of the simplified configuration file (prune.cfg) for filter-level sparsity:
# global prune parameter prune_config{ filter_pruner { balanced_l2_norm_filter_prune { prune_ratio: 0.3 ascend_optimized: True } } } # skip layers regular_prune_skip_layers: "Opname" regular_prune_skip_layers: "Opname" # override specific layers override_layer_configs: { layer_name: "Opname" prune_config : { filter_pruner: { balanced_l2_norm_filter_prune: { prune_ratio: 0.5 ascend_optimized: True } } } } - The following is an example of the simplified configuration file (selective_prune.cfg) for 2:4 structured sparsity:
# global prune parameter prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 0 } } } # skip layers regular_prune_skip_layers: "Opname" regular_prune_skip_layers: "Opname" # override specific layers override_layer_configs: { layer_name: "Opname" prune_config : { n_out_of_m_pruner: { l1_selective_prune: { n_out_of_m_type: M4N2 update_freq: 1 } } } } - The following is an example of the simplified configuration file (compressed1.cfg) for compression combination (filter-level sparsity + INT8 quantization):
prune_config : { filter_pruner : { balanced_l2_norm_filter_prune : { prune_ratio : 0.3 ascend_optimized: True } } } # skip_layers: "skip_layers_name_0" skip_layer_types: "Optype" quant_skip_layers: "Opname" quant_skip_types: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } retrain_data_quant_config : { ulq_quantize : { clip_max_min : { clip_max : 6.0 clip_min : -6.0 } } } prune_config : { filter_pruner : { balanced_l2_norm_filter_prune : { prune_ratio : 0.5 ascend_optimized: True } } } } - The following is an example of the simplified configuration file (compressed2.cfg) for compression combination (2:4 structured sparsity + INT8 quantization):
prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 0 } } } # skip_layers: "skip_layers_name_0" skip_layer_types: "Optype" quant_skip_layers: "quant_skip_layers_name_0" quant_skip_types: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } retrain_data_quant_config : { ulq_quantize : { clip_max_min : { clip_max : 6.0 clip_min : -6.0 } } } prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 1 } } } }