Simplified QAT Configuration File

Table 1 describes the fields in the retrain_config_tf.proto file. Find the file in /amct_tensorflow/proto/retrain_config_tf.proto under the AMCT installation directory.

Based on this file, you can configure the simplified QAT configuration file, the simplified sparsity configuration file, or the simplified compression combination configuration file.

Table 1 retrain_config_tf.proto

Parameter

Required/Optional

Type

Field

Description

AMCTRetrainConfig

-

-

-

Simplified QAT configuration of AMCT.

Repeated

String

skip_layers

Layers to skip compression. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers.

If both skip_layers and quant_skip_layers or both skip_layers and regular_prune_skip_layers are configured, the union set is used.

Repeated

String

skip_layer_types

Types of layers to skip compression. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types.

If both skip_layer_types and quant_skip_types are set or both skip_layer_types and regular_prune_skip_types are set, the union is used.

Repeated

RetrainOverrideLayer

override_layer_configs

Layers to override. It is used to determine which layers are to be differentiatedly quantized.

For example, by using this parameter, you can perform differentiated quantization on some layers to change the setting of the global quantization configuration parameter from INT8 quantization to INT4 quantization. The current version supports only INT8 quantization.

Parameter priority:

  • Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
  • Sparsity scenario: override_layer_configs > override_layer_types > prune_config

Repeated

RetrainOverrideLayerType

override_layer_types

Types of layers to override. It is used to determine which layers are to be differentiatedly quantized.

For example, by using this parameter, you can perform differentiated quantization on some layers to change the setting of the global quantization configuration parameter from INT8 quantization to INT4 quantization. The current version supports only INT8 quantization.

Parameter priority:

  • Quantization scenario: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
  • Sparsity scenario: override_layer_configs > override_layer_types > prune_config

Optional

FakequantPrecisionMode

fakequant_precision_mode

scale_d value precision mode of the quantization custom operator in the fake-quantized model.

  • FORCE_FP16_QUANT: The scale_d value is converted to float16 (float32 type).
  • Empty (default), that is, not configured. The value of scale_d is of the float32 precision.

Optional

UInt32

batch_num

Batch number for quantization.

Required

RetrainDataQuantConfig

retrain_data_quant_config

Activation quantization configuration parameter for QAT. It is a global quantization configuration parameter.

Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

Required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration parameter for QAT. It is a global quantization configuration parameter.

Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

Repeated

String

quant_skip_layers

Layers to skip quantization. Applicable to quantization.

If both skip_layers and quant_skip_layers are set, their union is used.

Repeated

String

quant_skip_types

Types of layers to skip quantization. Applicable to quantization.

If both skip_layer_types and quant_skip_types are set, their union is used.

Optional

PruneConfig

prune_config

Sparsity configuration. It is a global sparsity configuration parameter.

Parameter priority: override_layer_configs > override_layer_types > prune_config

Repeated

String

regular_prune_skip_layers

Layers to skip sparsity. Applicable to sparsity.

If both skip_layers and regular_prune_skip_layers are set, their union is used.

Repeated

String

regular_prune_skip_types

Types of layers to skip sparsity. Applicable to sparsity.

If both skip_layer_types and regular_prune_skip_types are set, their union is used.

RetrainDataQuantConfig

-

-

-

Activation quantization configuration for QAT.

-

ActULQquantize

ulq_quantize

Activation quantization algorithm. Currently, only ULQ is supported.

ActULQquantize

-

-

-

ULQ parameters for activation quantization. For details about the algorithm, see ULQ Algorithm.

Optional

DataType

dst_type

Activation quantization bit width. The options are as follows. The current version supports only INT8 quantization.

  • INT4
  • INT8 (default)
  • INT16

Optional

ClipMaxMin

clip_max_min

Initial upper and lower bounds. IFMR is used for initialization by default.

Optional

Boolean

fixed_min

Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms.

ClipMaxMin

-

-

-

Initial upper and lower bounds.

Required

Float

clip_max

Initial upper bound.

Required

Float

clip_min

Initial lower bound.

RetrainWeightQuantConfig

-

-

-

Weight quantization configuration for QAT.

-

ARQRetrain

arq_retrain

ARQ algorithm.

-

WtsULQRetrain

ulq_retrain

ULQ algorithm for weight quantization.

ARQRetrain

-

-

-

ARQ algorithm parameters. For details about the algorithm, see ARQ Algorithm.

Optional

DataType

dst_type

Quantization bit width, either INT8 (default) or INT4 quantization. The current version supports only INT8 quantization.

Optional

Boolean

channel_wise

Channel-wise ARQ enable.

WtsULQRetrain

-

-

-

ULQ parameters for weight quantization. For details about the algorithm, see ULQ Algorithm.

Optional

DataType

dst_type

Quantization bit width, either INT8 (default) or INT4 quantization. The current version supports only INT8 quantization.

Optional

Boolean

channel_wise

Channel-wise ULQ enable.

RetrainOverrideLayer

-

-

-

Layer overriding configuration.

Required

String

layer_name

Layer name.

Required

RetrainDataQuantConfig

retrain_data_quant_config

Activation quantization configuration to override.

Required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration to override.

Optional

PruneConfig

prune_config

Sparsity configuration to override.

RetrainOverrideLayerType

-

-

-

Type of the layer to override.

Required

String

layer_type

Layer type.

Required

RetrainDataQuantConfig

retrain_data_quant_config

Activation quantization configuration to override.

Required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration to override.

Optional

PruneConfig

prune_config

Sparsity configuration to override.

PruneConfig

-

-

-

Sparsity configuration.

-

FilterPruner

filter_pruner

Filter-level (output channel) sparsity configuration.

-

NOutOfMPruner

n_out_of_m_pruner

Configuration of 2:4 structured sparsity.

Due to hardware restrictions, the Atlas inference series products and Atlas training products do not support the 2:4 structured sparsity feature.

FilterPruner

-

-

-

Filter-level sparsity configuration.

-

BalancedL2NormFilterPruner

balanced_l2_norm_filter_prune

BalancedL2Norm algorithm. For details about the algorithm, see Manual Channel Pruning Algorithm.

BalancedL2NormFilterPruner

-

-

-

BalancedL2Norm algorithm configuration.

Required

Float

prune_ratio

Sparsity ratio, that is, the ratio of the number of sparsified filters to the total number of filters. The recommended value is 0.2, indicating that 20% of the filters will be sparsified.

Optional

Boolean

ascend_optimized

Whether to perform adaptation to Ascend platforms. If the sparsified model is to be deployed on the Ascend AI Processor, you are advised to set this parameter to true.

NOutOfMPruner

-

-

-

Configuration of 2:4 structured sparsity.

-

L1SelectivePruner

l1_selective_prune

L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm.

L1SelectivePruner

-

-

-

Configuration of the L1SelectivePrune algorithm.

Optional

NOutOfMType

n_out_of_m_type

Currently, only M4N2 is supported. That is, two weights in every four consecutive weights are reserved.

Optional

UInt32

update_freq

Interval for updating 2:4 sparsity. If update_freq is set to 0, the selections of 2:4 sparsity are updated only in the first batch. If update_freq is set to 2, the selections of 2:4 sparsity are updated in every two batches. The rest may be deduced by analogy. The default value is 0.

  • The following is an example of the simplified configuration file (quant.cfg) for QAT:
    # global quantize parameter
     retrain_data_quant_config: {
         ulq_quantize: {
             clip_max_min: {
                 clip_max: 6.0
                 clip_min: -6.0
             }
             dst_type: INT8
         }
     }
    
     retrain_weight_quant_config: {
         arq_retrain: {
             channel_wise: true
             dst_type: INT8
         }
     }
    
     skip_layers: "conv_1"
    
     override_layer_types : {
         layer_type: "Optype"
         retrain_weight_quant_config: {
             arq_retrain: {
                channel_wise: false
                dst_type: INT8
             }
         }
     }
    
     override_layer_configs : {    
        layer_name: "Opname"   
        retrain_weight_quant_config: {        
           arq_retrain: {   
              channel_wise: false
              dst_type: INT8
             }   
           }
    }
  • The following is an example of the simplified configuration file (prune.cfg) for filter-level sparsity:
    # global prune parameter
    prune_config{
        filter_pruner {
            balanced_l2_norm_filter_prune {
                prune_ratio: 0.3
                ascend_optimized: True
            }
        }
    }
    
    # skip layers
    regular_prune_skip_layers: "Opname"
    regular_prune_skip_layers: "Opname"
    
    # overide specific layers
    override_layer_configs: {
        layer_name: "Opname"
        prune_config : {
            filter_pruner: {
                balanced_l2_norm_filter_prune: {
                    prune_ratio: 0.5
                    ascend_optimized: True
                }
            }
        }
    }
  • The following is an example of the simplified configuration file (selective_prune.cfg) for 2:4 structured sparsity:
    # global prune parameter
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 0
            }
        }
    }
    
    # skip layers
    regular_prune_skip_layers: "Opname"
    regular_prune_skip_layers: "Opname"
    
    # overide specific layers
    override_layer_configs: {
        layer_name: "Opname"
        prune_config : {
            n_out_of_m_pruner: {
                l1_selective_prune: {
                    n_out_of_m_type: M4N2
                    update_freq: 1
                }
            }
        }
    }
  • The following is an example of the simplified configuration file (compressed1.cfg) for compression combination (filter-level sparsity + INT8 quantization):
    prune_config : {
        filter_pruner : {
            balanced_l2_norm_filter_prune : {
                prune_ratio : 0.3
                ascend_optimized: True
            }
        }
    }
    # skip_layers: "skip_layers_name_0"
    skip_layer_types: "Optype"
    
    quant_skip_layers: "Opname"
    quant_skip_types: "Optype"
    
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: true
        dst_type: INT8
        }
    }
    
    override_layer_types : {
        layer_type: "Optype"
        retrain_weight_quant_config: {
            arq_retrain: {
            channel_wise: false
            dst_type: INT8
            }
        }
        retrain_data_quant_config : {
            ulq_quantize : {
                clip_max_min : {
                    clip_max : 6.0
                    clip_min : -6.0
                }
            }
        }
        prune_config : {
            filter_pruner : {
                balanced_l2_norm_filter_prune : {
                    prune_ratio : 0.5
                    ascend_optimized: True
                }
            }
        }
    }
  • The following is an example of the simplified configuration file (compressed2.cfg) for compression combination (2:4 structured sparsity + INT8 quantization):
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 0
            }
        }
    }
    # skip_layers: "skip_layers_name_0"
    skip_layer_types: "Optype"
    
    quant_skip_layers: "Opname"
    quant_skip_types: "Optype"
    
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: true
        dst_type: INT8
        }
    }
    
    override_layer_types : {
        layer_type: "Optype"
        retrain_weight_quant_config: {
            arq_retrain: {
            channel_wise: false
            dst_type: INT8
            }
        }
        retrain_data_quant_config : {
            ulq_quantize : {
                clip_max_min : {
                    clip_max : 6.0
                    clip_min : -6.0
                }
            }
        }
        prune_config{
            n_out_of_m_pruner {
                l1_selective_prune {
                    n_out_of_m_type: M4N2
                    update_freq: 1
                }
            }
        }
    }