Simplified QAT Configuration File

Table 1 describes the fields in the retrain_config_tf.proto file. Find the file in /amct_tensorflow/proto/retrain_config_tf.proto under the AMCT installation directory.

This file can be used to configure the simplified configuration file for quantization aware training, simplified configuration file for sparsity, and simplified configuration file for combined compression. You can configure the file based on the scenario.

**Table 1** Parameter description
Message	Required	Type	Parameter	Description
AMCTRetrainConfig	-	-	-	Simplified QAT configuration of AMCT.
	repeated	string	skip_layers	Specifies the layers that are skipped by layer name and are not compressed. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers. If both skip_layer and quant_skip_layers or both skip_layer and regular_prune_skip_layers are configured, the union set is used.
	repeated	string	skip_layer_types	Specifies the layers to be skipped for compression by layer type. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types. If both skip_layer_types and quant_skip_types or both skip_layer_types and regular_prune_skip_types are configured, the union set of the two parameters is used.
	repeated	RetrainOverrideLayer	override_layer_configs	Rewrite layers by layer name, that is, perform differentiated compression on the layers. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported. Parameter Priority Quantization scenario: override_layer_configs>override_layer_types>retrain_data_quant_config/retrain_weight_quant_config Sparse scenario: override_layer_configs>override_layer_types>prune_config
	repeated	RetrainOverrideLayerType	override_layer_types	Rewrite layers by layer type, that is, perform differentiated compression on the layers. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported. Parameter Priority Quantization scenario: override_layer_configs>override_layer_types>retrain_data_quant_config/retrain_weight_quant_config Sparse scenario: override_layer_configs>override_layer_types>prune_config
	optional	FakequantPrecisionMode	fakequant_precision_mode	Scale_d numerical precision mode of the quant custom operator in the fakequant model. FORCE_FP16_QUANT: converts the scale_d value to float16. The value type is float32. The default value is empty. The precision of scale_d is float32.
	optional	uint32	batch_num	Batch count used for quantization.
	required	RetrainDataQuantConfig	retrain_data_quant_config	Quantization configuration parameter for QAT data. Global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration parameter for QAT. Global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	repeated	string	quant_skip_layers	Skips layers that do not need to be quantized by layer name. Parameter used in the quantization scenario. If both skip_layers and quant_skip_layers are set, their union is used.
	repeated	string	quant_skip_types	Skips layers that do not need to be quantized by layer type. Parameter used in the quantization scenario. If both skip_layer_types and quant_skip_types are set, their union is used.
	optional	PruneConfig	prune_config	Sparsity configuration. Global sparsity configuration parameter. The parameter priority is as follows: override_layer_configs>override_layer_types>prune_config
	repeated	string	regular_prune_skip_layers	Skips layers that do not need to be sparse by layer name. Parameter used in the sparse scenario. If both skip_layers and regular_prune_skip_layers are set, their union is used.
	repeated	string	regular_prune_skip_types	Types of layers to skip structured sparsity. Parameter used in the sparse scenario. If both skip_layer_types and regular_prune_skip_types are set, their union is used.
RetrainDataQuantConfig	-	-	-	Data quantization configuration for QAT.
RetrainDataQuantConfig	-	ActULQquantize	ulq_quantize	Activation quantization algorithm. Currently, only ULQ is supported.
ActULQquantize	-	-	-	ULQ parameters for activation quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization.
	optional	DataType	dst_type	Bit width select of INT8 or INT4 quantization. Defaults to INT8. Currently, only INT8 quantization is supported.
	optional	ClipMaxMin	clip_max_min	Initial upper and lower bounds. IFMR is used for initialization by default.
	optional	bool	fixed_min	Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms.
ClipMaxMin	-	-	-	Initial upper and lower bounds.
	required	float	clip_max	Initial upper bound.
	required	float	clip_min	Initial lower bound.
RetrainWeightQuantConfig	-	-	-	Weight quantization configuration for QAT.
	-	ARQRetrain	arq_retrain	ARQ algorithm for weight quantization.
	-	WtsULQRetrain	ulq_retrain	ULQ algorithm for weight quantization.
ARQRetrain	-	-	-	ARQ parameters for weight quantization. For details about the algorithm, see ARQ Algorithm.
	optional	DataType	dst_type	Quantization bit width, either INT8 (default) or INT4 quantization. Currently, only INT8 quantization is supported.
	optional	bool	channel_wise	Channel-wise ARQ enable.
WtsULQRetrain	-	-	-	ARQ parameters for weight quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization.
	optional	DataType	dst_type	Quantization bit width, either INT8 (default) or INT4 quantization. Currently, only INT8 quantization is supported.
	optional	bool	channel_wise	Channel-wise ULQ enable.
RetrainOverrideLayer	-	-	-	Layer overriding configuration.
	required	string	layer_name	Layer name.
	required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to apply.
	required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to apply.
	optional	PruneConfig	prune_config	Sparsity configuration to apply.
RetrainOverrideLayerType	-	-	-	Types of layers to override.
	required	string	layer_type	Layer type.
	required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to apply.
	required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to apply.
	optional	PruneConfig	prune_config	Sparsity configuration to apply.
PruneConfig	-	-	-	Sparsity configuration.
	-	FilterPruner	filter_pruner	Filter-level (output channel) sparsity configuration.
	-	NOutOfMPruner	n_out_of_m_pruner	Configuration of 2:4 structured sparsity. Due to hardware restrictions, the Atlas 200/300/500 Inference Product, and Atlas Training Series Product do not support the 2:4 structured sparsity feature. Enabling this feature obtains few performance benefits.
FilterPruner	-	-	-	Filter-level sparsity configuration.
FilterPruner	-	BalancedL2NormFilterPruner	balanced_l2_norm_filter_prune	BalancedL2Norm algorithm. For details about the algorithm, see Manual Channel Pruning Algorithm.
BalancedL2NormFilterPruner	-	-	-	BalancedL2Norm algorithm configuration.
	required	float	prune_ratio	Sparsity ratio, that is, the ratio of the number of pruned filters to the number of filters. The recommended value is 0.2, that is, 20% of the output channels are cropped.
	optional	bool	ascend_optimized	Whether to perform adaptation to Ascend platforms. If the pruned model is to be deployed on Ascend AI Processor, you are advised to set this parameter to true.
NOutOfMPruner	-	-	-	Configuration of 2:4 structured sparsity.
NOutOfMPruner	-	L1SelectivePruner	l1_selective_prune	L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm.
L1SelectivePruner	-	-	-	Hash Algorithm Configuration
	optional	NOutOfMType	n_out_of_m_type	Currently, only M4N2 is supported. That is, two weights are reserved in every four consecutive weights.
	optional	uint32	update_freq	Interval for updating 2:4 sparsity. When update_freq is 0, the 2-out-of-4 sparsity is updated only in the first batch. When update_freq is 2, the update is performed every two batches. The rest can be deduced by analogy. The default value is 0.

The following is an example simplified QAT configuration file (quant.cfg):

# global quantize parameter
 retrain_data_quant_config: {
     ulq_quantize: {
         clip_max_min: {
             clip_max: 6.0
             clip_min: -6.0
         }
         dst_type: INT8
     }
 }

 retrain_weight_quant_config: {
     arq_retrain: {
         channel_wise: true
         dst_type: INT8
     }
 }

 skip_layers: "conv_1"

 override_layer_types : {
     layer_type: "Optype"
     retrain_weight_quant_config: {
         arq_retrain: {
            channel_wise: false
            dst_type: INT8
         }
     }
 }

 override_layer_configs : {    
    layer_name: "Opname"   
    retrain_weight_quant_config: {        
       arq_retrain: {   
          channel_wise: false
          dst_type: INT8
         }   
       }
}

The following is an example simplified filter-level sparsity configuration file (prune.cfg):

# global prune parameter
prune_config{
    filter_pruner {
        balanced_l2_norm_filter_prune {
            prune_ratio: 0.3
            ascend_optimized: True
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# overide specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        filter_pruner: {
            balanced_l2_norm_filter_prune: {
                prune_ratio: 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified configuration file selective_prune.cfg for 2:4 structured sparsity:

# global prune parameter
prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# overide specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        n_out_of_m_pruner: {
            l1_selective_prune: {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

The following is an example of the simplified configuration file compressed1.cfg for combined compression (channel sparsity + INT8 quantization):

prune_config : {
    filter_pruner : {
        balanced_l2_norm_filter_prune : {
            prune_ratio : 0.3
            ascend_optimized: True
        }
    }
}
# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "Opname"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config : {
        filter_pruner : {
            balanced_l2_norm_filter_prune : {
                prune_ratio : 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified compression configuration file compressed2.cfg (2:4 structured sparsity + INT8 quantization):

prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}
# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "Opname"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

Parent topic: See Also