Overview

Currently, filter-level sparsity supports sparsity of different layers with different sparsity ratios. However, it is difficult for users to set layer-wise sparsity ratios and select a sparsity ratio (that is, prune_ratio in the configuration) for a certain layer. In addition, manual configuration needs to be retrained, which is time-consuming. To address these issues, the auto channel pruning search feature is introduced to calculate the sparsity sensitivity (affecting accuracy) and sparsity gain (affecting performance) of each channel based on the user model. Then, the search policy searches for the optimal layer-wise channel sparsity ratio to balance accuracy and performance.

In the current version, this feature is a trial feature and cannot be used in commercial products.

  • Sparsity sensitivity: It is defined as the estimated impact of the current channel on the accuracy of the entire network after pruning. A higher sparsity sensitivity indicates a greater accuracy drop. The default algorithm is based on Taylor expansion of loss(w - wi) that calculates the channel sensitivity. The calculation method can be customized. After the ith channel is tailored, the sparsity sensitivity is calculated as follows:

    Taylor expansion is performed on loss(w - wi) by using an approximate estimation method. Currently, only one order is calculated.

  • Sparsity gain: The sparsity gain of the current channel is represented by bit complexity, and is the product of quantized Flops and bit widths.

    Flops indicates the floating-point calculation amount, act_bit indicates the activation precision, and wts_bit indicates the weight precision.

Figure 1 shows the auto channel pruning search process. For details about the layers that support pruning as well as their specifications, see Table 1.

Figure 1 Auto channel pruning search process

The main steps are as follows:

  1. Initialization: Parse the user model and pruning configuration (optional), analyze the channel pruning layers on the network and the corresponding channel pruning configuration (whether prune_ratio is specified by the user), and generate the search space. Then, parse the configuration of the target compression ratio.
    • Search space: It refers to layers that support channel pruning, but without sparsity ratio configured through override_layer_configs or override_layer_types.
    • Compression ratio: It is defined as the ratio of the bit complexity of the original model to the bit complexity of the sparsified model.
  2. Sensitivity calculation: Calculate the sparsity sensitivity of each channel. The built-in sensitivity calculation method based on loss estimation is used to estimate the network loss change after the channel is tailored. The sensitivity calculation method can be customized.
  3. Bit complexity calculation: Calculate the bit complexity of each channel, which is considered as the sparsity gain of a channel.
  4. Channel pruning ratio configuration search: The algorithm provided in Auto Channel Pruning Search Algorithm is used by default, to search for the optimal channel pruning ratio configuration that satisfies the specified compression ratio of the user. The solver can be customized.

    The auto channel pruning search feature only generates a simplified configuration file for channel pruning. To obtain the final sparse model, perform Manual Sparsity and pass the generated simplified configuration file as an input parameter to channel pruning.