Manual Channel Pruning Algorithm

AMCT uses the BalancedL2Norm algorithm for filter-level sparsity. This algorithm calculates the L2 norm (by calculating the square root of the square sum of all elements) of each weight filter (output channel), sorts the output channels by importance, and prunes the less important channels (that is, channels with lower importance are pruned first). The following figure shows the principles.

Figure 1 Principle of the channel sparsity algorithm

In the sparse configuration file, the BalancedL2NormFilterPruner field is used to control the BalancedL2Norm algorithm. (The PyTorch framework is used as an example. For details about the parameters, see.)

  • prune_ratio: sparsity ratio, the ratio of the number of pruned filters to the total number of filters. You can set this ratio to control the sparsity degree. For example, a 0.3 sparsity ratio indicates that 30% of the output channels will be pruned.
  • ascend_optimized: adaptation to Ascend platforms. Currently, 16-aligned optimization is used. After the optimization is enabled, the number of non-pruned channels will be aligned to a multiple of 16. For example, for 20 channels, with the sparsity ratio set to 0.25 and 16-aligned optimization disabled, the number of non-pruned channels is 15. By comparison, if 16-aligned optimization is enabled, the number of non-pruned channels is 16. This configuration can improve the inference performance of the sparse model on Ascend AI Processor.