Simplified QAT Configuration File

Table 1 describes the fields in the retrain_config_pytorch.proto file. Find the file in /amct_pytorch/proto/retrain_config_pytorch.proto under the AMCT installation directory.

This file can be used to configure the simplified configuration file for quantization aware training, simplified configuration file for sparsity, and simplified configuration file for combined compression. You can configure the file based on the scenario.

Table 1 Parameter description

Message

Required

Type

Parameter

Description

AMCTRetrainConfig

-

-

-

Simplified QAT configuration of AMCT.

repeated

string

skip_layers

Layers to skip. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_layers and regular_prune_skip_layers.

If both skip_layers and quant_skip_layers or both skip_layers and regular_prune_skip_layers are configured, the union set is used.

repeated

string

skip_layer_types

Types of layers to skip. It is globally effective, which efficiently realizes the same functionality for extended features. If this field is set, you can skip the settings of the extended quant_skip_types and regular_prune_skip_types.

If both skip_layer_types and quant_skip_types or both skip_layer_types and regular_prune_skip_types are configured, the union set of the two parameters is used.

repeated

RetrainOverrideLayer

override_layer_configs

Rewrite layers by layer name, that is, perform differentiated compression on the layers.

For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported.

Parameter Priority

  • Quantization scenario: override_layer_configs>override_layer_types>retrain_data_quant_config/retrain_weight_quant_config
  • Sparse scenario: override_layer_configs>override_layer_types>prune_config

repeated

RetrainOverrideLayerType

override_layer_types

Rewrite layers by layer type, that is, perform differentiated compression on the layers.

For example, if the quantization bit width configured by the global quantization configuration parameter is INT8, you can perform differentiated quantization on some layers by using this parameter. In this case, INT4 quantization can be configured. Currently, only INT8 quantization is supported.

Parameter Priority

  • Quantization scenario: override_layer_configs>override_layer_types>retrain_data_quant_config/retrain_weight_quant_config
  • Sparse scenario: override_layer_configs>override_layer_types>prune_config

optional

FakequantPrecisionMode

fakequant_precision_mode

Scale_d numerical precision mode of the quant custom operator in the fakequant model.

  • FORCE_FP16_QUANT: converts the scale_d value to float16. The value type is float32.
  • The default value is empty. The precision of scale_d is float32.

optional

uint32

batch_num

Batch count used for quantization.

required

RetrainDataQuantConfig

retrain_data_quant_config

Quantization configuration parameter for QAT data. Global quantization configuration parameter.

Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration parameter for QAT. Global quantization configuration parameter.

Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

repeated

string

quant_skip_layers

Skips layers that do not need to be quantized by layer name. Parameter used in the quantization scenario.

If both skip_layers and quant_skip_layers are set, their union is used.

repeated

string

quant_skip_types

Skips layers that do not need to be quantized by layer type. Parameter used in the quantization scenario.

If both skip_layer_types and quant_skip_types are set, their union is used.

optional

PruneConfig

prune_config

Sparsity configuration. Global sparsity configuration parameter.

The parameter priority is as follows: override_layer_configs>override_layer_types>prune_config

repeated

string

regular_prune_skip_layers

Layers to skip structured sparsity. Parameter used in the sparse scenario.

If both skip_layers and regular_prune_skip_layers are set, their union is used.

repeated

string

regular_prune_skip_types

Types of layers to skip structured sparsity. Parameter used in the sparse scenario.

If both skip_layer_types and regular_prune_skip_types are set, their union is used.

RetrainDataQuantConfig

-

-

-

Data quantization configuration for QAT.

-

ActULQquantize

ulq_quantize

Activation quantization algorithm. Currently, only ULQ is supported.

ActULQquantize

-

-

-

ULQ algorithm for activation quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization.

optional

DataType

dst_type

Data quantization bit width. The current version supports only INT8 quantization.

  • INT4
  • INT8. The default value is INT8.
  • INT16

optional

ClipMaxMin

clip_max_min

Initial upper and lower bounds. IFMR is used for initialization by default.

optional

bool

fixed_min

Whether to fix the lower bound at 0. Set to true for ReLU or false for other algorithms.

ClipMaxMin

-

-

-

Initial upper and lower bounds.

required

float

clip_max

Initial upper bound.

required

float

clip_min

Initial lower bound.

RetrainWeightQuantConfig

-

-

-

Weight quantization configuration for QAT.

-

ARQRetrain

arq_retrain

ARQ algorithm for weight quantization.

-

WtsULQRetrain

ulq_retrain

ULQ algorithm for weight quantization.

ARQRetrain

-

-

-

ARQ algorithm for weight quantization. For details about the algorithm, see ARQ Algorithm.

optional

DataType

dst_type

Bit width select of INT8 or INT4 quantization. Defaults to INT8. Currently, only INT8 quantization is supported.

optional

bool

channel_wise

Channel-wise ARQ enable.

WtsULQRetrain

-

-

-

ULQ algorithm for weight quantization. For details about the algorithm, see ULQ Algorithm for Activation Quantization.

optional

DataType

dst_type

Bit width select of INT8 or INT4 quantization. Defaults to INT8. Currently, only INT8 quantization is supported.

optional

bool

channel_wise

Channel-wise ULQ enable.

RetrainOverrideLayer

-

-

-

Layer overriding configuration.

required

string

layer_name

Layer name.

required

RetrainDataQuantConfig

retrain_data_quant_config

Activation quantization configuration to apply.

required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration to apply.

optional

PruneConfig

prune_config

Sparsity configuration to apply.

RetrainOverrideLayerType

-

-

-

Types of layers to override.

required

string

layer_type

Layer type.

required

RetrainDataQuantConfig

retrain_data_quant_config

Activation quantization configuration to apply.

required

RetrainWeightQuantConfig

retrain_weight_quant_config

Weight quantization configuration to apply.

optional

PruneConfig

prune_config

Sparsity configuration to apply.

PruneConfig

-

-

-

Sparsity configuration.

-

FilterPruner

filter_pruner

Filter-level (output channel) sparsity configuration.

-

NOutOfMPruner

n_out_of_m_pruner

Configuration of 2:4 structured sparsity.

Due to hardware restrictions, the Atlas 200/300/500 Inference Product, and Atlas Training Series Product do not support the 2:4 structured sparsity feature. Enabling this feature obtains few performance benefits.

FilterPruner

-

-

-

Filter-level sparsity configuration.

-

BalancedL2NormFilterPruner

balanced_l2_norm_filter_prune

BalancedL2NormFilterPruner algorithm. For details about the algorithm, see Manual Channel Pruning Algorithm.

BalancedL2NormFilterPruner

-

-

-

BalancedL2NormFilterPruner algorithm configuration.

required

float

prune_ratio

Sparsity ratio, that is, the ratio of the number of pruned filters to the number of filters. The recommended value is 0.2, that is, 20% of the output channels are cropped.

optional

bool

ascend_optimized

Whether to perform adaptation to Ascend platforms. If the pruned model is to be deployed on Ascend AI Processor, you are advised to set this parameter to true.

NOutOfMPruner

-

-

-

Configuration of 2:4 structured sparsity.

-

L1SelectivePruner

l1_selective_prune

L1SelectivePrune algorithm. For details about the algorithm, see 2:4 Structured Sparsity Algorithm.

L1SelectivePruner

-

-

-

Hash Algorithm Configuration

optional

NOutOfMType

n_out_of_m_type

Currently, only M4N2 is supported. That is, two weights are reserved in every four consecutive weights.

optional

uint32

update_freq

Interval for updating 2:4 sparsity. When update_freq is 0, the 2-out-of-4 sparsity is updated only in the first batch. When update_freq is 2, the update is performed every two batches. The rest can be deduced by analogy. The default value is 0.

  • The following is an example simplified QAT configuration file (quant.cfg):
    # global quantize parameter
    retrain_data_quant_config: {
        ulq_quantize: {
            clip_max_min: {
                clip_max: 6.0
                clip_min: -6.0
            }
            fixed_min: true
            dst_type: INT8
        }
    }
    
    retrain_weight_quant_config: {
        arq_retrain: {
           channel_wise: true
           dst_type: INT8
           }
      }
    
    skip_layers: "Opname"
    skip_layer_types: "Optype"
    
    override_layer_types : {
        layer_type: "Optype"
        retrain_weight_quant_config: {
            arq_retrain: {
               channel_wise: false
               dst_type: INT8     
            }
        }
    }
    
    override_layer_configs : {
       layer_name: "Opname"
       retrain_data_quant_config: {
           ulq_quantize: {
               clip_max_min: {
                   clip_max: 3.0
                   clip_min: -3.0
               }
               dst_type: INT8
           }
       }
    }
  • The following is an example simplified filter-level sparsity configuration file (prune.cfg):
    # global prune parameter
    prune_config{
        filter_pruner {
            balanced_l2_norm_filter_prune {
                prune_ratio: 0.3
                ascend_optimized: True
            }
        }
    }
    
    # skip layers
    regular_prune_skip_layers: "Opname"
    regular_prune_skip_layers: "Opname"
    
    # overide specific layers
    override_layer_configs: {
        layer_name: "Opname"
        prune_config : {
            filter_pruner: {
                balanced_l2_norm_filter_prune: {
                    prune_ratio: 0.5
                    ascend_optimized: True
                }
            }
        }
    }
  • The following is an example of the simplified configuration file selective_prune.cfg for 2:4 structured sparsity:
    # global prune parameter
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 0
            }
        }
    }
    
    # skip layers
    regular_prune_skip_layers: "Opname"
    regular_prune_skip_layers: "Opname"
    
    # overide specific layers
    override_layer_configs: {
        layer_name: "Opname"
        prune_config : {
            n_out_of_m_pruner: {
                l1_selective_prune: {
                    n_out_of_m_type: M4N2
                    update_freq: 1
                }
            }
        }
    }
  • The following is an example of the simplified configuration file compressed1.cfg for combined compression (channel sparsity + INT8 quantization):
    prune_config : {
        filter_pruner : {
            balanced_l2_norm_filter_prune : {
                prune_ratio : 0.3
                ascend_optimized: True
            }
        }
    }
    
    # skip_layers: "skip_layers_name_0"
    skip_layer_types: "Optype"
    
    quant_skip_layers: "Opname"
    quant_skip_types: "Optype"
    
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: true
        dst_type: INT8
        }
    }
    
    override_layer_types : {
        layer_type: "Optype"
        retrain_weight_quant_config: {
            arq_retrain: {
            channel_wise: false
            dst_type: INT8
            }
        }
        retrain_data_quant_config : {
            ulq_quantize : {
                clip_max_min : {
                    clip_max : 6.0
                    clip_min : -6.0
                }
            }
        }
        prune_config : {
            filter_pruner : {
                balanced_l2_norm_filter_prune : {
                    prune_ratio : 0.5
                    ascend_optimized: True
                }
            }
        }
    }
  • The following is an example of the simplified compression configuration file compressed2.cfg (2:4 structured sparsity + INT8 quantization):
    prune_config{
        n_out_of_m_pruner {
            l1_selective_prune {
                n_out_of_m_type: M4N2
                update_freq: 0
            }
        }
    }
    # skip_layers: "skip_layers_name_0"
    skip_layer_types: "Optype"
    
    quant_skip_layers: "quant_skip_layers_name_0"
    quant_skip_types: "Optype"
    
    retrain_weight_quant_config: {
        arq_retrain: {
        channel_wise: true
        dst_type: INT8
        }
    }
    
    override_layer_types : {
        layer_type: "Optype"
        retrain_weight_quant_config: {
            arq_retrain: {
            channel_wise: false
            dst_type: INT8
            }
        }
        retrain_data_quant_config : {
            ulq_quantize : {
                clip_max_min : {
                    clip_max : 6.0
                    clip_min : -6.0
                }
            }
        }
        prune_config{
            n_out_of_m_pruner {
                l1_selective_prune {
                    n_out_of_m_type: M4N2
                    update_freq: 1
                }
            }
        }
    }