Simplified QAT Configuration File

Table 1 describes the fields in the retrain_config_pytorch.proto file. Find the file in /amct_pytorch/proto/retrain_config_pytorch.proto under the AMCT installation directory.

This file can be used to configure the simplified configuration file for QAT, simplified configuration file for sparsity, and simplified configuration file for compression combination. You can configure the file based on the scenario.

**Table 1** retrain_config_pytorch.proto
Parameter	Required/Optional	Type	Field	Description
AMCTRetrainConfig	-	-	-	Simplified QAT configuration of AMCT.
	Repeated	String	skip_layers	Layers to skip compression by layer name. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers. If both skip_layers and quant_skip_layers or both skip_layers and regular_prune_skip_layers are configured, the union set is used.
	Repeated	String	skip_layer_types	Layers to skip compression by layer type. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types. If both skip_layer_types and quant_skip_types or both skip_layer_types and regular_prune_skip_types are configured, the union set of the two parameters is used.
	Repeated	RetrainOverrideLayer	override_layer_configs	Layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. The current version supports only INT8 quantization. Parameter priority: Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config Sparsity scenario: override_layer_configs > override_layer_types > prune_config
	Repeated	RetrainOverrideLayerType	override_layer_types	Types of layers to override. It is used to determine which layers are to be differentiatedly quantized. For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. The current version supports only INT8 quantization. Parameter priority: Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config Sparsity scenario: override_layer_configs > override_layer_types > prune_config
	Optional	FakequantPrecisionMode	fakequant_precision_mode	scale_d value precision mode of the quantization custom operator in the fake-quantized model. FORCE_FP16_QUANT: converts the scale_d value from float32 to float16. Empty (default), that is, not configured. The precision of scale_d is float32.
	Optional	UInt32	batch_num	Batch number for quantization.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Quantization configuration parameter for QAT data. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization parameter for QAT. It is a global quantization configuration parameter. Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
	Repeated	String	quant_skip_layers	Layers to skip quantization. Applicable to quantization. If both skip_layers and quant_skip_layers are set, their union is used.
	Repeated	String	quant_skip_types	Types of layers to skip quantization. Applicable to quantization. If both skip_layer_types and quant_skip_types are set, their union is used.
	Optional	PruneConfig	prune_config	Sparsity configuration. It is a global sparsity configuration parameter. Parameter priority: override_layer_configs > override_layer_types > prune_config
	Repeated	String	regular_prune_skip_layers	Layers to skip structured sparsity. Applicable to sparsity. If both skip_layers and regular_prune_skip_layers are set, their union is used.
	Repeated	String	regular_prune_skip_types	Types of layers to skip structured sparsity. Applicable to sparsity. If both skip_layer_types and regular_prune_skip_types are set, their union is used.
RetrainDataQuantConfig	-	-	-	Activation quantization configuration for QAT.
RetrainDataQuantConfig	-	ActULQquantize	ulq_quantize	Activation quantization algorithm. Currently, only ULQ is supported.
ActULQquantize	-	-	-	ULQ algorithm for activation quantization. For details about the algorithm, see ULQ Algorithm.
	Optional	DataType	dst_type	Activation quantization bit width. The options are as follows. The current version supports only INT8 quantization. INT4 INT8 (default) INT16
	Optional	ClipMaxMin	clip_max_min	Initial upper and lower bounds. IFMR is used for initialization by default.
	Optional	Boolean	fixed_min	Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms.
ClipMaxMin	-	-	-	Initial upper and lower bounds. This parameter is used to compute the scale factor. Training and tuning are performed based on the initial value. It is recommended that the initial value be the same as the upper and lower bounds of the actual inference data. Otherwise, the accuracy may be low after quantization.
	Required	Float	clip_max	Initial upper bound.
	Required	Float	clip_min	Initial lower bound.
RetrainWeightQuantConfig	-	-	-	Weight quantization configuration for QAT.
	-	ARQRetrain	arq_retrain	ARQ algorithm.
	-	WtsULQRetrain	ulq_retrain	ULQ algorithm for weight quantization.
ARQRetrain	-	-	-	ARQ algorithm configuration. For details about the algorithm, see ARQ Algorithm.
	Optional	DataType	dst_type	Bit width select of INT8 or INT4 quantization, defaulted to INT8. The current version supports only INT8 quantization.
	Optional	Boolean	channel_wise	Channel-wise ARQ enable.
WtsULQRetrain	-	-	-	ULQ algorithm for weight quantization. For details about the algorithm, see ULQ Algorithm.
	Optional	DataType	dst_type	Bit width select of INT8 or INT4 quantization, defaulted to INT8. The current version supports only INT8 quantization.
	Optional	Boolean	channel_wise	Channel-wise ULQ enable.
RetrainOverrideLayer	-	-	-	Layer overriding configuration.
	Required	String	layer_name	Layer name.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to override.
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to override.
	Optional	PruneConfig	prune_config	Sparsity configuration to override.
RetrainOverrideLayerType	-	-	-	Types of layers to override.
	Required	String	layer_type	Layer type.
	Required	RetrainDataQuantConfig	retrain_data_quant_config	Activation quantization configuration to override.
	Required	RetrainWeightQuantConfig	retrain_weight_quant_config	Weight quantization configuration to override.
	Optional	PruneConfig	prune_config	Sparsity configuration to override.
PruneConfig	-	-	-	Sparsity configuration.
	-	FilterPruner	filter_pruner	Filter-level (output channel-level) sparsity configuration.
	-	NOutOfMPruner	n_out_of_m_pruner	Configuration of 2:4 structured sparsity. Due to hardware restrictions, the Atlas inference series products and Atlas training products do not support the 2:4 structured sparsity feature.
FilterPruner	-	-	-	Filter-level sparsity configuration.
FilterPruner	-	BalancedL2NormFilterPruner	balanced_l2_norm_filter_prune	BalancedL2NormFilterPruner algorithm (BCP). For details about the algorithm, see Manual Channel Pruning Algorithm.
BalancedL2NormFilterPruner	-	-	-	BalancedL2NormFilterPruner algorithm configuration.
	Required	Float	prune_ratio	Sparsity ratio, that is, the ratio of the number of sparsified filters to the total number of filters. The recommended value is 0.2, indicating that 20% of the filters will be sparsified.
	Optional	Boolean	ascend_optimized	Whether to perform adaptation to Ascend platforms. If the sparsified model is to be deployed on the Ascend AI Processor, you are advised to set this parameter to true.
NOutOfMPruner	-	-	-	Configuration of 2:4 structured sparsity.
NOutOfMPruner	-	L1SelectivePruner	l1_selective_prune	L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm.
L1SelectivePruner	-	-	-	Configuration of the L1SelectivePrune algorithm.
	Optional	NOutOfMType	n_out_of_m_type	Currently, only M4N2 is supported. That is, two weights in every four consecutive weights are reserved.
	Optional	UInt32	update_freq	Interval for updating 2:4 sparsity. If update_freq is set to 0, the selections of 2:4 sparsity are updated only in the first batch. If update_freq is set to 2, the selections of 2:4 sparsity are updated in every two batches. The rest may be deduced by analogy. The default value is 0.

The following is an example of the simplified configuration file (quant.cfg) for QAT:

# global quantize parameter
retrain_data_quant_config: {
    ulq_quantize: {
        clip_max_min: {
            clip_max: 6.0
            clip_min: -6.0
        }
        fixed_min: true
        dst_type: INT8
    }
}

retrain_weight_quant_config: {
    arq_retrain: {
       channel_wise: true
       dst_type: INT8
       }
  }

skip_layers: "Opname"
skip_layer_types: "Optype"

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
           channel_wise: false
           dst_type: INT8    
        }
    }
}

override_layer_configs : {
   layer_name: "Opname"
   retrain_data_quant_config: {
       ulq_quantize: {
           clip_max_min: {
               clip_max: 3.0
               clip_min: -3.0
           }
           dst_type: INT8
       }
   }
}

The following is an example of the simplified configuration file (prune.cfg) for filter-level sparsity:

# global prune parameter
prune_config{
    filter_pruner {
        balanced_l2_norm_filter_prune {
            prune_ratio: 0.3
            ascend_optimized: True
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# override specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        filter_pruner: {
            balanced_l2_norm_filter_prune: {
                prune_ratio: 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified configuration file (selective_prune.cfg) for 2:4 structured sparsity:

# global prune parameter
prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}

# skip layers
regular_prune_skip_layers: "Opname"
regular_prune_skip_layers: "Opname"

# override specific layers
override_layer_configs: {
    layer_name: "Opname"
    prune_config : {
        n_out_of_m_pruner: {
            l1_selective_prune: {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

The following is an example of the simplified configuration file (compressed1.cfg) for compression combination (filter-level sparsity + INT8 quantization):

prune_config : {
    filter_pruner : {
        balanced_l2_norm_filter_prune : {
            prune_ratio : 0.3
            ascend_optimized: True
        }
    }
}

# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "Opname"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config : {
        filter_pruner : {
            balanced_l2_norm_filter_prune : {
                prune_ratio : 0.5
                ascend_optimized: True
            }
        }
    }
}

The following is an example of the simplified configuration file (compressed2.cfg) for compression combination (2:4 structured sparsity + INT8 quantization):

prune_config{
    n_out_of_m_pruner {
        l1_selective_prune {
            n_out_of_m_type: M4N2
            update_freq: 0
        }
    }
}
# skip_layers: "skip_layers_name_0"
skip_layer_types: "Optype"

quant_skip_layers: "quant_skip_layers_name_0"
quant_skip_types: "Optype"

retrain_weight_quant_config: {
    arq_retrain: {
    channel_wise: true
    dst_type: INT8
    }
}

override_layer_types : {
    layer_type: "Optype"
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: false
        dst_type: INT8
        }
    }
    retrain_data_quant_config : {
        ulq_quantize : {
            clip_max_min : {
                clip_max : 6.0
                clip_min : -6.0
            }
        }
    }
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 1
            }
        }
    }
}

Parent topic: Reference