Simplified QAT Configuration File
Table 1 describes the fields in the retrain_config_tf.proto file. Find the file in /amct_tensorflow/proto/retrain_config_tf.proto under the AMCT installation directory.
This file can be used to configure the simplified configuration file for quantization aware training, simplified configuration file for sparsity, and simplified configuration file for combined compression. You can configure the file based on the scenario.
Message |
Required |
Type |
Parameter |
Description |
|---|---|---|---|---|
AMCTRetrainConfig |
- |
- |
- |
Simplified QAT configuration of AMCT. |
repeated |
string |
skip_layers |
Specifies the layers that are skipped by layer name and are not compressed. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers. If both skip_layer and quant_skip_layers or both skip_layer and regular_prune_skip_layers are configured, the union set is used. |
|
repeated |
string |
skip_layer_types |
Specifies the layers to be skipped for compression by layer type. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types. If both skip_layer_types and quant_skip_types or both skip_layer_types and regular_prune_skip_types are configured, the union set of the two parameters is used. |
|
repeated |
RetrainOverrideLayer |
override_layer_configs |
Rewrite layers by layer name, that is, perform differentiated compression on the layers. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported. Parameter Priority
|
|
repeated |
RetrainOverrideLayerType |
override_layer_types |
Rewrite layers by layer type, that is, perform differentiated compression on the layers. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported. Parameter Priority
|
|
optional |
FakequantPrecisionMode |
fakequant_precision_mode |
Scale_d numerical precision mode of the quant custom operator in the fakequant model.
|
|
optional |
uint32 |
batch_num |
Batch count used for quantization. |
|
required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Quantization configuration parameter for QAT data. Global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config |
|
required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization configuration parameter for QAT. Global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config |
|
repeated |
string |
quant_skip_layers |
Skips layers that do not need to be quantized by layer name. Parameter used in the quantization scenario. If both skip_layers and quant_skip_layers are set, their union is used. |
|
repeated |
string |
quant_skip_types |
Skips layers that do not need to be quantized by layer type. Parameter used in the quantization scenario. If both skip_layer_types and quant_skip_types are set, their union is used. |
|
optional |
PruneConfig |
prune_config |
Sparsity configuration. Global sparsity configuration parameter. The parameter priority is as follows: override_layer_configs>override_layer_types>prune_config |
|
repeated |
string |
regular_prune_skip_layers |
Skips layers that do not need to be sparse by layer name. Parameter used in the sparse scenario. If both skip_layers and regular_prune_skip_layers are set, their union is used. |
|
repeated |
string |
regular_prune_skip_types |
Types of layers to skip structured sparsity. Parameter used in the sparse scenario. If both skip_layer_types and regular_prune_skip_types are set, their union is used. |
|
RetrainDataQuantConfig |
- |
- |
- |
Data quantization configuration for QAT. |
- |
ActULQquantize |
ulq_quantize |
Activation quantization algorithm. Currently, only ULQ is supported. |
|
ActULQquantize |
- |
- |
- |
ULQ parameters for activation quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization. |
optional |
DataType |
dst_type |
Bit width select of INT8 or INT4 quantization. Defaults to INT8. Currently, only INT8 quantization is supported. |
|
optional |
ClipMaxMin |
clip_max_min |
Initial upper and lower bounds. IFMR is used for initialization by default. |
|
optional |
bool |
fixed_min |
Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms. |
|
ClipMaxMin |
- |
- |
- |
Initial upper and lower bounds. |
required |
float |
clip_max |
Initial upper bound. |
|
required |
float |
clip_min |
Initial lower bound. |
|
RetrainWeightQuantConfig |
- |
- |
- |
Weight quantization configuration for QAT. |
- |
ARQRetrain |
arq_retrain |
ARQ algorithm for weight quantization. |
|
- |
WtsULQRetrain |
ulq_retrain |
ULQ algorithm for weight quantization. |
|
ARQRetrain |
- |
- |
- |
ARQ parameters for weight quantization. For details about the algorithm, see ARQ Algorithm. |
optional |
DataType |
dst_type |
Quantization bit width, either INT8 (default) or INT4 quantization. Currently, only INT8 quantization is supported. |
|
optional |
bool |
channel_wise |
Channel-wise ARQ enable. |
|
WtsULQRetrain |
- |
- |
- |
ARQ parameters for weight quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization. |
optional |
DataType |
dst_type |
Quantization bit width, either INT8 (default) or INT4 quantization. Currently, only INT8 quantization is supported. |
|
optional |
bool |
channel_wise |
Channel-wise ULQ enable. |
|
RetrainOverrideLayer |
- |
- |
- |
Layer overriding configuration. |
required |
string |
layer_name |
Layer name. |
|
required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Activation quantization configuration to apply. |
|
required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization configuration to apply. |
|
optional |
PruneConfig |
prune_config |
Sparsity configuration to apply. |
|
RetrainOverrideLayerType |
- |
- |
- |
Types of layers to override. |
required |
string |
layer_type |
Layer type. |
|
required |
RetrainDataQuantConfig |
retrain_data_quant_config |
Activation quantization configuration to apply. |
|
required |
RetrainWeightQuantConfig |
retrain_weight_quant_config |
Weight quantization configuration to apply. |
|
optional |
PruneConfig |
prune_config |
Sparsity configuration to apply. |
|
PruneConfig |
- |
- |
- |
Sparsity configuration. |
- |
FilterPruner |
filter_pruner |
Filter-level (output channel) sparsity configuration. |
|
- |
NOutOfMPruner |
n_out_of_m_pruner |
Configuration of 2:4 structured sparsity. Due to hardware restrictions, the |
|
FilterPruner |
- |
- |
- |
Filter-level sparsity configuration. |
- |
BalancedL2NormFilterPruner |
balanced_l2_norm_filter_prune |
BalancedL2Norm algorithm. For details about the algorithm, see Manual Channel Pruning Algorithm. |
|
BalancedL2NormFilterPruner |
- |
- |
- |
BalancedL2Norm algorithm configuration. |
required |
float |
prune_ratio |
Sparsity ratio, that is, the ratio of the number of pruned filters to the number of filters. The recommended value is 0.2, that is, 20% of the output channels are cropped. |
|
optional |
bool |
ascend_optimized |
Whether to perform adaptation to Ascend platforms. If the pruned model is to be deployed on Ascend AI Processor, you are advised to set this parameter to true. |
|
NOutOfMPruner |
- |
- |
- |
Configuration of 2:4 structured sparsity. |
- |
L1SelectivePruner |
l1_selective_prune |
L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm. |
|
L1SelectivePruner |
- |
- |
- |
Hash Algorithm Configuration |
optional |
NOutOfMType |
n_out_of_m_type |
Currently, only M4N2 is supported. That is, two weights are reserved in every four consecutive weights. |
|
optional |
uint32 |
update_freq |
Interval for updating 2:4 sparsity. When update_freq is 0, the 2-out-of-4 sparsity is updated only in the first batch. When update_freq is 2, the update is performed every two batches. The rest can be deduced by analogy. The default value is 0. |
- The following is an example simplified QAT configuration file (quant.cfg):
# global quantize parameter retrain_data_quant_config: { ulq_quantize: { clip_max_min: { clip_max: 6.0 clip_min: -6.0 } dst_type: INT8 } } retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } skip_layers: "conv_1" override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } } override_layer_configs : { layer_name: "Opname" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } } - The following is an example simplified filter-level sparsity configuration file (prune.cfg):
# global prune parameter prune_config{ filter_pruner { balanced_l2_norm_filter_prune { prune_ratio: 0.3 ascend_optimized: True } } } # skip layers regular_prune_skip_layers: "Opname" regular_prune_skip_layers: "Opname" # overide specific layers override_layer_configs: { layer_name: "Opname" prune_config : { filter_pruner: { balanced_l2_norm_filter_prune: { prune_ratio: 0.5 ascend_optimized: True } } } } - The following is an example of the simplified configuration file selective_prune.cfg for 2:4 structured sparsity:
# global prune parameter prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 0 } } } # skip layers regular_prune_skip_layers: "Opname" regular_prune_skip_layers: "Opname" # overide specific layers override_layer_configs: { layer_name: "Opname" prune_config : { n_out_of_m_pruner: { l1_selective_prune: { n_out_of_m_type: M4N2 update_freq: 1 } } } } - The following is an example of the simplified configuration file compressed1.cfg for combined compression (channel sparsity + INT8 quantization):
prune_config : { filter_pruner : { balanced_l2_norm_filter_prune : { prune_ratio : 0.3 ascend_optimized: True } } } # skip_layers: "skip_layers_name_0" skip_layer_types: "Optype" quant_skip_layers: "Opname" quant_skip_types: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } retrain_data_quant_config : { ulq_quantize : { clip_max_min : { clip_max : 6.0 clip_min : -6.0 } } } prune_config : { filter_pruner : { balanced_l2_norm_filter_prune : { prune_ratio : 0.5 ascend_optimized: True } } } } - The following is an example of the simplified compression configuration file compressed2.cfg (2:4 structured sparsity + INT8 quantization):
prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 0 } } } # skip_layers: "skip_layers_name_0" skip_layer_types: "Optype" quant_skip_layers: "Opname" quant_skip_types: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: true dst_type: INT8 } } override_layer_types : { layer_type: "Optype" retrain_weight_quant_config: { arq_retrain: { channel_wise: false dst_type: INT8 } } retrain_data_quant_config : { ulq_quantize : { clip_max_min : { clip_max : 6.0 clip_min : -6.0 } } } prune_config{ n_out_of_m_pruner { l1_selective_prune { n_out_of_m_type: M4N2 update_freq: 1 } } } }