Manual Channel Pruning Algorithm

AMCT uses the BalancedL2Norm algorithm for filter-level sparsity. This algorithm calculates the L2 norm (by calculating the square root of the square sum of all elements) of each weight filter (output channel), sorts the output channels by importance, and prunes the less important channels (that is, channels with lower importance are sparsified first). The following figure shows the principles.

Figure 1 Principles of the filter-level sparsity algorithm

The BalancedL2NormFilterPruner field in the sparsity configuration file controls the BalancedL2Norm algorithm. PyTorch is used as an example. For details about the field, see Simplified QAT Configuration File.

prune_ratio: sparsity ratio, the ratio of the number of sparsified filters to the total number of filters. You can set this ratio to control the sparsity degree. For example, a 0.3 sparsity ratio indicates that 30% of the output channels will be sparsified.
ascend_optimized: adaptation to Ascend platforms. Currently, 16-aligned optimization is used. After the optimization is enabled, the number of non-sparsified channels will be aligned to a multiple of 16. For example, for 20 channels, with the sparsity ratio set to 0.25 and 16-aligned optimization disabled, the number of non-sparsified channels is 15. By comparison, if 16-aligned optimization is enabled, the number of non-sparsified channels is 16. This configuration can improve the inference performance of the sparse model on the Ascend AI Processor.

Parent topic: Sparsity Algorithms