Simplified QAT Configuration File

Table 1 describes the fields in the retrain_config_tf.proto file. Find the file in /amct_tensorflow/proto/retrain_config_tf.proto under the AMCT installation directory.

Based on this file, you can configure the simplified QAT configuration file, the simplified sparsity configuration file, or the simplified compression combination configuration file.

**Table 1** retrain_config_tf.proto
Parameter	Required/Optional	Type	Field	Description
AMCTRetrainConfig	-	-	-	Simplified QAT configuration of AMCT.
	Repeated	String	skip_layers	Layers to skip compression. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers. If both skip_layers and quant_skip_layers or both skip_layers and regular_prune_skip_layers are configured, the union set is used.
	Repeated	String	skip_layer_types	Types of layers to skip compression. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types. If both skip_layer_types and quant_skip_types are set or both skip_layer_types and regular_prune_skip_types are set, the union is used.
	Repeated	RetrainOverrideLayer	override_layer_configs	Layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, by using this parameter, you can perform differentiated quantization on some layers to change the setting of the global quantization configuration parameter from INT8 quantization to INT4 quantization. The current version supports only INT8 quantization. Parameter priority: Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config Sparsity scenario: override_layer_configs > override_layer_types > prune_config
	Repeated	RetrainOverrideLayerType	override_layer_types	Types of layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, by using this parameter, you can perform differentiated quantization on some layers to change the setting of the global quantization configuration parameter from INT8 quantization to INT4 quantization. The current version supports only INT8 quantization. Parameter priority: Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config Sparsity scenario: override_layer_configs > override_layer_types > prune_config
	Optional	FakequantPrecisionMode	fakequant_precision_mode	scale_d value precision mode of the quantization custom operator in the fake-quantized model. FORCE_FP16_QUANT: The scale_d value is converted to float16 (float32 type). Empty (default), that is, not configured. The value of scale_d is of the float32 precision.
	Optional	UInt32	batch_num	Batch number for quantization.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration parameter for QAT. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration parameter for QAT. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	Repeated	String	quant_skip_layers	Layers to skip quantization. Applicable to quantization. If both skip_layers and quant_skip_layers are set, their union is used.
	Repeated	String	quant_skip_types	Types of layers to skip quantization. Applicable to quantization. If both skip_layer_types and quant_skip_types are set, their union is used.
	Optional	PruneConfig	prune_config	Sparsity configuration. It is a global sparsity configuration parameter. Parameter priority: override_layer_configs > override_layer_types > prune_config
	Repeated	String	regular_prune_skip_layers	Layers to skip sparsity. Applicable to sparsity. If both skip_layers and regular_prune_skip_layers are set, their union is used.
	Repeated	String	regular_prune_skip_types	Types of layers to skip sparsity. Applicable to sparsity. If both skip_layer_types and regular_prune_skip_types are set, their union is used.
RetrainDataQuantConfig	-	-	-	Activation quantization configuration for QAT.
RetrainDataQuantConfig	-	ActULQquantize	ulq_quantize	Activation quantization algorithm. Currently, only ULQ is supported.
ActULQquantize	-	-	-	ULQ parameters for activation quantization. For details about the algorithm, see ULQ Algorithm.
	Optional	DataType	dst_type	Activation quantization bit width. The options are as follows. The current version supports only INT8 quantization. INT4 INT8 (default) INT16
	Optional	ClipMaxMin	clip_max_min	Initial upper and lower bounds. IFMR is used for initialization by default.
	Optional	Boolean	fixed_min	Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms.
ClipMaxMin	-	-	-	Initial upper and lower bounds.
	Required	Float	clip_max	Initial upper bound.
	Required	Float	clip_min	Initial lower bound.
RetrainWeightQuantConfig	-	-	-	Weight quantization configuration for QAT.
	-	ARQRetrain	arq_retrain	ARQ algorithm.
	-	WtsULQRetrain	ulq_retrain	ULQ algorithm for weight quantization.
ARQRetrain	-	-	-	ARQ algorithm parameters. For details about the algorithm, see ARQ Algorithm.
	Optional	DataType	dst_type	Quantization bit width, either INT8 (default) or INT4 quantization. The current version supports only INT8 quantization.
	Optional	Boolean	channel_wise	Channel-wise ARQ enable.
WtsULQRetrain	-	-	-	ULQ parameters for weight quantization. For details about the algorithm, see ULQ Algorithm.
	Optional	DataType	dst_type	Quantization bit width, either INT8 (default) or INT4 quantization. The current version supports only INT8 quantization.
	Optional	Boolean	channel_wise	Channel-wise ULQ enable.
RetrainOverrideLayer	-	-	-	Layer overriding configuration.
	Required	String	layer_name	Layer name.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to override.
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to override.
	Optional	PruneConfig	prune_config	Sparsity configuration to override.
RetrainOverrideLayerType	-	-	-	Type of the layer to override.
	Required	String	layer_type	Layer type.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to override.
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to override.
	Optional	PruneConfig	prune_config	Sparsity configuration to override.
PruneConfig	-	-	-	Sparsity configuration.
	-	FilterPruner	filter_pruner	Filter-level (output channel) sparsity configuration.
	-	NOutOfMPruner	n_out_of_m_pruner	Configuration of 2:4 structured sparsity. Due to hardware restrictions, the Atlas inference series products and Atlas training products do not support the 2:4 structured sparsity feature.
FilterPruner	-	-	-	Filter-level sparsity configuration.
FilterPruner	-	BalancedL2NormFilterPruner	balanced_l2_norm_filter_prune	BalancedL2Norm algorithm. For details about the algorithm, see Manual Channel Pruning Algorithm.
BalancedL2NormFilterPruner	-	-	-	BalancedL2Norm algorithm configuration.
	Required	Float	prune_ratio	Sparsity ratio, that is, the ratio of the number of sparsified filters to the total number of filters. The recommended value is 0.2, indicating that 20% of the filters will be sparsified.
	Optional	Boolean	ascend_optimized	Whether to perform adaptation to Ascend platforms. If the sparsified model is to be deployed on the Ascend AI Processor, you are advised to set this parameter to true.
NOutOfMPruner	-	-	-	Configuration of 2:4 structured sparsity.
NOutOfMPruner	-	L1SelectivePruner	l1_selective_prune	L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm.
L1SelectivePruner	-	-	-	Configuration of the L1SelectivePrune algorithm.
	Optional	NOutOfMType	n_out_of_m_type	Currently, only M4N2 is supported. That is, two weights in every four consecutive weights are reserved.
	Optional	UInt32	update_freq	Interval for updating 2:4 sparsity. If update_freq is set to 0, the selections of 2:4 sparsity are updated only in the first batch. If update_freq is set to 2, the selections of 2:4 sparsity are updated in every two batches. The rest may be deduced by analogy. The default value is 0.

The following is an example of the simplified configuration file (quant.cfg) for QAT:

# global quantize parameter
 retrain_data_quant_config: {
     ulq_quantize: {
         clip_max_min: {
             clip_max: 6.0
             clip_min: -6.0
         }
         dst_type: INT8
     }
 }

 retrain_weight_quant_config: {
     arq_retrain: {
         channel_wise: true
         dst_type: INT8
     }
 }

 skip_layers: "conv_1"

 override_layer_types : {
     layer_type: "Optype"
     retrain_weight_quant_config: {
         arq_retrain: {
            channel_wise: false
            dst_type: INT8
         }
     }
 }

 override_layer_configs : {    
    layer_name: "Opname"   
    retrain_weight_quant_config: {        
       arq_retrain: {   
          channel_wise: false
          dst_type: INT8
         }   
       }
}

The following is an example of the simplified configuration file (prune.cfg) for filter-level sparsity:

# global prune parameter
prune_config{
    filter_pruner {
        balanced_l2_norm_filter_prune {
            prune_ratio: 0.3
            ascend_optimized: True
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# overide specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        filter_pruner: {
            balanced_l2_norm_filter_prune: {
                prune_ratio: 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified configuration file (selective_prune.cfg) for 2:4 structured sparsity:

# global prune parameter
prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# overide specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        n_out_of_m_pruner: {
            l1_selective_prune: {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

The following is an example of the simplified configuration file (compressed1.cfg) for compression combination (filter-level sparsity + INT8 quantization):

prune_config : {
    filter_pruner : {
        balanced_l2_norm_filter_prune : {
            prune_ratio : 0.3
            ascend_optimized: True
        }
    }
}
# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "Opname"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config : {
        filter_pruner : {
            balanced_l2_norm_filter_prune : {
                prune_ratio : 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified configuration file (compressed2.cfg) for compression combination (2:4 structured sparsity + INT8 quantization):

prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}
# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "Opname"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

Parent topic: Reference