Overview

Compression combination is used to both quantize and sparsify a network model. This section describes the supported compression combinations.

Compression Combination

The following compression combination modes are currently supported. When AMCT is used for compression, only one compression combination mode can be selected for each compressible operator at a time. Figure 1 shows the process.

  • 2:4 Structured Sparsity + Quantization Aware Training for INT8 quantization
  • 2:4 Structured Sparsity + Quantization Aware Training for INT8 quantization

Currently, the compression combination feature requires manual compression configuration (thus called static compression combination), that is, to set global quantization bit width and sparsity ratio (the ratio of the number of pruned filters to the total number of filters) or update_freq (interval for updating 2:4 sparsity) for automatic model compression. For details about the compression configuration file, see Simplified QAT Configuration File. For the compression combination example, see Sample List.

For the layers that support compression combination and related restrictions, see Table Layers that support filter-level sparsity as well as their restrictions in Manual Sparsity, Table Layers that support filter-level sparsity as well as their restrictions in 2:4 Structured Sparsity, or Table Layers that support QAT as well as their restrictions in Quantization Aware Training.

Figure 1 Process of compression combination

Introduction to Combined Compression Scenarios

Currently, compression combination is supported in the following scenarios. In practice, you can configure parameters in the simplified configuration file to control the compression combination. In the following scenarios, the network-wide quantization refers to Performs quantization aware training., where:

  • Network-wide quantization:
    • Network-wide (global) quantization configuration parameter: retrain_data_quant_config/retrain_weight_quant_config
    • Differentiated configuration parameters of some layers: override_layer_configs or override_layer_types

    Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

  • Network-wide sparsity: includes filter-level sparsity and 2:4 structured sparsity. Use either of them.
    • Network-wide (global) sparsity configuration parameter: prune_config
    • Differentiated sparse parameters of some layers: override_layer_configs or override_layer_types

    Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

For details about the preceding parameters, see Simplified QAT Configuration File.

Table 1 Introduction to Combined Compression Scenarios

Combined compression scenario

Parameter

Description

Network-wide quantization + Network-wide sparsity

  • Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config
  • Sparse parameter: prune_config

Do not configure override_layer_configs or override_layer_types.

-

Differentiated quantization at some layers + Network-wide sparsity

  • Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config+override_layer_configs or override_layer_types
  • Sparse parameter: prune_config

Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.

Network-wide quantization + Differentiated sparsity of some layers

  • Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config
  • Sparse parameter: prune_config+override_layer_configs or override_layer_types

Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.

Partial-layer quantization + partial-layer differentiated sparsity

  • Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config+override_layer_configs or override_layer_types
  • Sparse parameter: prune_config+override_layer_configs or override_layer_types

Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.

Network-wide quantization:

retrain_data_quant_config/retrain_weight_quant_config

Do not configure override_layer_configs or override_layer_types.

-

Differentiated quantification at some layers

retrain_data_quant_config/retrain_weight_quant_config+override_layer_configs or override_layer_types

Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.

Network-wide sparsity:

prune_config

Do not configure override_layer_configs or override_layer_types.

-

Differentiated sparsity of some layers

prune_config+override_layer_configs or override_layer_types

Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.