Introduction

Tensor decomposition converts a convolution into a stack of two smaller ones by decomposing its convolution kernel to reduce the inference overhead. If the user model involves huge convolution workloads and most of the convolution kernels have shapes larger than (64, 64, 3, 3), tensor decomposition is recommended. In other cases, skip this step and proceed to quantization.

Currently, tensor decomposition is supported under the following conditions:

group = 1, dilation = (1,1), stride < 3
kernel_h > 2, kernel_w > 2

The preceding are the basic conditions for tensor decomposition. AMCT will have a final check on your model. For example, only when the original PyTorch model has the torch.nn.Conv2d layer and the layer meets the preceding conditions can the Conv2D layer be decomposed into two smaller Conv2D layers.

Generally, the accuracy of the decomposed model is lower than that of the original model. Therefore, you need to fine-tune the model to improve the accuracy. Then, for better inference performance, you can use AMCT to convert the decomposed model into a quantizable model deployable on Ascend AI Processor.

This step is optional.

Restrictions

For Conv2D layers with large shapes, the decomposition process might be time-consuming or terminated abnormally. To avoid this problem, refer to the following before starting decomposition:

Reference performance data of the decomposition tool:
- CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
- Memory: 512 GB
Time taken to decompose a single convolutional layer:
- About 7s for shape (512, 512, 5, 5)
- Shape (1024, 1024, 3, 3), which takes about 12 seconds.
- About 52s for shape (1024, 1024, 5, 5)
- About 89s for shape (2048, 2048, 3, 3)
- About 374s for shape (2048, 2048, 5, 5)
Memory threshold-crossing risk notification:
It takes about 32 GB memory to decompose a convolution kernel with shape (2048, 2048, 5, 5).

Decomposition Method

There are two tensor decomposition modes: online tensor decomposition and offline tensor decomposition. You can select one mode based on the site requirements. Detailed explanations are as below:

Online tensor decomposition
Directly decompose the original network model and perform fine-tuning: Introduce the tensor decomposition API to the training code to decompose the model that contains pre-trained weights and then perform fine-tuning.

In this process, the model structure is directly modified and the weight is updated during tensor decomposition. The decomposed model can be directly used. The advantage is that the model is easy to use and only one step is required. The disadvantage is that the decomposition calculation needs to be performed each time the script is invoked, which takes some time.

Offline tensor decomposition
That is, decompose the original network model and then fine-tune the model. Call the tensor decomposition API to decompose the model with pre-trained weights, and save the decomposition information file and the weights after decomposition. In subsequent use, introduce the tensor decomposition API to the training script, read the saved decomposition information file to decompose the model structure, load the saved decomposed weights, and then fine-tune the model.

In this process, the decomposition information and decomposed weights are saved, and then quickly loaded and used during fine-tuning. The advantage is that the decomposition can be performed once and used for multiple times, and the loading is almost time-consuming. The disadvantage is that two steps are required, and the decomposed weights need to be saved and loaded when being used.

Parent topic: Tensor Decomposition