Tensor Decomposition

Tensor decomposition converts a convolution into a stack of two smaller ones by decomposing its convolution kernel to reduce the inference overhead. If the user model involves huge convolution workloads and most of the convolution kernels have shapes larger than (64, 64, 3, 3), tensor decomposition is recommended. In other cases, skip this step and proceed to quantization.

Currently, tensor decomposition is supported under the following conditions:

  • group = 1, dilation = (1,1), stride < 3
  • kernel_h > 2, kernel_w > 2

Only when the source Caffe model has a Convolution layer and the layer meets the preceding conditions, the Convolution layer can be decomposed into two smaller Convolution layers. Then, you can use AMCT to convert the source Caffe model into a quantizable model deployable on Ascend AI Processor for better inference performance.

This step is optional.

Restrictions

For Conv2D layers with large shapes, the decomposition process might be time-consuming or terminated abnormally. To avoid this problem, refer to the following before starting decomposition:

  • Reference performance data of the decomposition tool:
    • CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    • Memory: 512 GB

    Time taken to decompose a single convolutional layer:

    • About 25s for shape (512, 512, 5, 5)
    • Shape (1024, 1024, 3, 3), which takes about 16 seconds.
    • About 78s for shape (1024, 1024, 5, 5)
    • About 63s for shape (2048, 2048, 3, 3)
    • About 430s for shape (2048, 2048, 5, 5)
  • Memory threshold-crossing risk notification:

    It takes about 32 GB memory to decompose a convolution kernel with shape (2048, 2048, 5, 5).

API Call Sequence

Figure 1 shows the API call sequence. For the tensor decomposition sample, see Additional Samples.

Figure 1 API call sequence for tensor decomposition

The procedure is as follows:

  1. Call auto_decomposition to perform tensor decomposition on the source Caffe model, generating a new model file and a new weight file.
  2. Fine-tune the decomposed model. Optionally quantize the fine-tuned model (see Post-Training Quantization or Quantization Aware Training for details).

Figure 2 shows the resnet_v2_50 model before and after decomposition.

Figure 2 Model before and after decomposition

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Import the required modules.
from amct_caffe.tensor_decompose import auto_decomposition

# Source model file
model_file = 'src_path/xxx.prototxt'    
# Source weight file
weights_file = 'src_path/xxx.caffemodel'  
# Result model file
new_model_file = 'decomposed_path/xxx.prototxt'   
# Result weight file
new_weights_file = 'decomposed_path/xxx.caffemodel' 

# Perform tensor decomposition.
auto_decomposition(model_file=model_file, weights_file=weights_file,
                   new_model_file=new_model_file, new_weights_file=new_weights_file)