Overview

Currently, channel pruning supports pruning of different layers with different sparsity ratios. However, it is difficult for users to set layer-wise sparsity ratios and select a sparsity ratio (that is, prune_ratio in the configuration) for a certain layer. In addition, manual configuration needs to be retrained, which is time-consuming. To address these issues, the auto channel pruning search feature is introduced to calculate the sparsity sensitivity (affecting accuracy) and sparsity gain (affecting performance) of each channel based on the user model. Then, the search policy searches for the optimal layer-wise channel sparsity ratio to balance accuracy and performance.

In the current version, this feature is a trial feature and cannot be used in commercial products.

Sparsity sensitivity: It is defined as the estimated impact of the current channel on the precision of the entire network after pruning. A higher sparsity sensitivity indicates a greater precision loss. The default algorithm is based on Taylor expansion of loss(w - w_i) that calculates the channel sensitivity. Allows users to customize calculation methods. After the ith channel is cropped, the sparsity sensitivity is calculated as follows:
$\text{[math]}$

Taylor expansion is performed on loss (w - wi) by using an approximate estimation method. Currently, only one order is considered for calculation.
Sparsity gain: The sparsity gain of the current channel is represented by bit complexity, which is the product of the calculated quantization Flops and the calculated bit width.
$\text{[math]}$

Flops is the floating-point computation amount, act_bit is the data precision of the data, and wts_bit is the data precision of the weight.

Figure 1 shows the auto channel pruning search process. For details about the layers that support pruning as well as their specifications, see Table 1.

Figure 1 Auto channel pruning search process

The process is described as follows:

Initialization: Parse the user model and sparsity configuration (optional), analyze the channel-level sparsity layer and the corresponding channel-level sparsity configuration (whether the user specifies the sparsity rate), and generate the search space generated by the channel-level sparsity configuration. Parses the target compression rate configuration.
- Search space: The layer supports channel sparsity, but the sparsity rate is not configured by using override_layer_configs or override_layer_types.
- Compression ratio: It is defined as the ratio of the bit complexity of the original model to the bit complexity of the sparse model.
Sensitivity calculation: Calculate the sparsity sensitivity of each channel. The built-in sensitivity calculation method based on loss estimation is used to estimate the network loss change after the channel is tailored based on the Taylor expansion. The sensitivity calculation method can be customized.
Bit complexity calculation: Calculate the bit complexity of each channel, which is considered as the sparsity gain of the channel.
Channel pruning ratio configuration search: Auto Channel Pruning Search Algorithm is used by default, to search for the optimal channel pruning ratio configuration that satisfies the specified compression ratio of the user. The solver can be customized.

The auto channel pruning search feature only generates a simplified configuration file for channel pruning. To obtain the final sparse model, perform Manual Sparsity and pass the generated simplified configuration file as an input parameter to channel pruning.

Parent topic: Auto channel pruning search