Overview

Compression combination is used to both quantize and sparsify a network model. This section describes the supported compression combinations.

Compression Combination Modes

The following compression combination modes are currently supported. When AMCT is used for compression, only one compression combination mode can be selected for each compressible operator at a time. Figure 1 shows the process.

Filter-Level Sparsity + Quantization Aware Training for INT8 quantization
2:4 Structured Sparsity + Quantization Aware Training for INT8 quantization

Currently, the compression combination feature requires manual compression configuration (thus called static compression combination), that is, to set global quantization bit width and sparsity ratio (the ratio of the number of sparsified filters to the total number of filters) or update_freq (interval for updating 2:4 sparsity) for automatic model compression. For details about the compression configuration file, see Simplified QAT Configuration File. For the compression combination example, see Sample List.

For the layers that support compression combination and related restrictions, see Table 1 in Filter-Level Sparsity, Table 1 in 2:4 Structured Sparsity, or Table 1 in Quantization Aware Training.

Figure 1 Process of compression combination

Introduction to Compression Combination Scenarios

The following introduces the compression combination scenarios that are currently supported. In practice, you can configure them by setting parameters in the simplified configuration file. In the following scenarios, the network-wide quantization refers to Quantization Aware Training, where:

Network-wide quantization:
- Network-wide (global) quantization configuration parameter: retrain_data_quant_config or retrain_weight_quant_config
- Differentiated configuration parameter for some layers: override_layer_configs or override_layer_types
Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config
Network-wide sparsity: includes filter-level sparsity and 2:4 structured sparsity. Use either of them.
- Network-wide (global) sparsity configuration parameter: prune_config
- Differentiated sparsity parameter for some layers: override_layer_configs or override_layer_types
Parameter priority: override_layer_configs > override_layer_types > retrain_data_quant_config/retrain_weight_quant_config

For details about the preceding parameters, see Simplified QAT Configuration File.

**Table 1** Introduction to compression combination scenarios
Scenario	Parameter	Description
Network-wide quantization + Network-wide sparsity	Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config Sparsity parameter: prune_config Do not set override_layer_configs or override_layer_types.	-
Differentiated quantization for some layers + Network-wide sparsity	Quantization parameters: retrain_data_quant_config/retrain_weight_quant_config + override_layer_configs or override_layer_types Sparsity parameter: prune_config	Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.
Network-wide quantization + Differentiated sparsity of some layers	Quantization parameter: retrain_data_quant_config/retrain_weight_quant_config Sparsity parameters: prune_config + override_layer_configs or override_layer_types	Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.
Quantization of some layers + Differentiated sparsity of some layers	Quantization parameters: retrain_data_quant_config/retrain_weight_quant_config + override_layer_configs or override_layer_types Sparsity parameters: prune_config + override_layer_configs or override_layer_types	Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.
Network-wide quantization	retrain_data_quant_config/retrain_weight_quant_config Do not set override_layer_configs or override_layer_types.	-
Differentiated quantization of some layers	retrain_data_quant_config/retrain_weight_quant_config + override_layer_configs or override_layer_types	Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.
Network-wide sparsity	prune_config Do not set override_layer_configs or override_layer_types.	-
Differentiated sparsity of some layers	prune_config + override_layer_configs or override_layer_types	Before setting feature parameters, specify the global configuration. Otherwise, the feature is disabled.

Parent topic: Compression Combination