Tensor Decomposition

Tensor decomposition converts a convolution into a stack of two smaller ones by decomposing its convolution kernel to reduce the inference overhead. If the user model involves huge convolution workloads and most of the convolution kernels have shapes larger than (64, 64, 3, 3), tensor decomposition is recommended. In other cases, skip this step and proceed to quantization.

Currently, tensor decomposition is supported under the following conditions:

group = 1, dilation = (1,1), stride < 3
kernel_h > 2, kernel_w > 2

The preceding are the basic conditions for tensor decomposition. AMCT will have a final check on your model.

Only when the source TensorFlow model has a Conv2D layer and the layer meets the preceding conditions, the Conv2D layer can be decomposed into two smaller Conv2D layers. Then, you can use AMCT to convert the source TensorFlow model into a quantizable model deployable on Ascend AI Processor for better inference performance.

This step is optional.

Restrictions

For Conv2D layers with large shapes, the decomposition process might be time-consuming or terminated abnormally. To avoid this problem, refer to the following before starting decomposition:

Performance reference data of the decomposition tool:
- CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
- Memory: 512 GB
Time taken to decompose a single convolutional layer:
- About 25s for shape (512, 512, 5, 5)
- Shape (1024, 1024, 3, 3), which takes about 16 seconds.
- About 78s for shape (1024, 1024, 5, 5)
- About 63s for shape (2048, 2048, 3, 3)
- About 430s for shape (2048, 2048, 5, 5)
Memory threshold-crossing risk notification:
It takes about 32 GB memory to decompose a convolution kernel with shape (2048, 2048, 5, 5).

API Call Sequence

Figure 1 shows the API call sequence. For the decomposition example, see Additional Samples.

Figure 1 API Call Sequence

The procedure is as follows:

Call auto_decomposition to perform tensor decomposition on the source TensorFlow model, generating a new model file.
Call decompose_graph to load the graph change information (.pkl) obtained in 1 to decompose the TensorFlow training graph and load the model weight file obtained in 1 to output the decomposed training graph.
Fine-tune the decomposed model to output a model that can be quantized. Perform Post-Training Quantization or Quantization Aware Training later.

Figure 2 shows the resnet_v2_50 model before and after decomposition.

Figure 2 Model before and after decomposition

Example

Call auto_decomposition to decompose the source TensorFlow model.

        
             from amct_tensorflow.tensor_decompose import auto_decomposition
meta_path = "src_path/xxx.meta" # meta file path
Path of the ckpt_path = "src_path/xxx" # ckpt file. For example, if the original model file is xxx.data-XXXXX-of-XXXXX and xxx.index in src_path, set the path to src_path/xxx.
save_path = "decomposed_path/xxx"   # Set the result path. If it is set to decomposed_path/xxx, files such as xxx.data-XXXXX-of-XXXXX will be generated in the decomposed_path directory
auto_decomposition(meta_path, ckpt_path, save_path)  # Start tensor decomposition.

Modify the existing training code, call decompose_graph to decompose the graph in the code, and fine-tune the decomposed model based on the weight file generated after decomposition.

Select a session or estimator mode based on the actual training code.

The asterisk (*) indicates the existing code of the user. The ellipsis (...) indicates that the existing code of the user is omitted. The following code is only an example. The actual code may be different. Adjust the code based on the site requirements.

Session mode

          
           
             
             
               from amct_tensorflow.tensor_decompose import decompose_graph
save_path = "decom_path/xxx" # Path for storing the decomposed model. That is, save_path in step 1.
# ...
net_output = build_net(net_input, ...)              # (*) Build a network graph.
decompose_graph(save_path)                          # Decompose the graph. Note that this step must be performed after the network graph is built and before the optimizer is applied.
variables_to_restore = tf.global_variables()        # Set all variables in the graph.
restorer = tf.train.Saver(variables_to_restore) #: sets the variables in the network diagram to the variables to be loaded.
    loss = build_loss(net_output, ...)        # (*) Build loss.
optimizer = build_optimizer(...) # (*) Constructs an optimizer.
The train_op = optimizer.minimize(loss, ...) # (*) applies the optimizer to optimize the loss.
# ...
variables_to_init = [v for v in tf.global_variables() if v not in variables_to_restore]  # Set the variables not to be restored.
init = tf.variables_initializer(variables_to_init)  # Prepare for initializing the variables not to be restored.
Session for with tf.Session() as sess: # (*) training
    sess.run(init)                                  # Initialize the variables not to be restored.
restorer.restore(sess, save_path) #: loads the variables to be loaded from the decomposed model weights.
# ...

              

            

          
         

Estimator mode

          
               from amct_tensorflow.tensor_decompose import decompose_graph
save_path = "decom_path/xxx" # Path for saving the decomposed model, that is, save_path in.
# ...
Model functions of def model_fn(features, labels, ...): # (*)Estimator
    net_output = build_net(net_input, ...)    # (*) Build a network graph.
    decompose_graph(save_path)                # Decompose the graph. Note that this step must be performed after the network graph is built and before the optimizer is applied.
    loss = build_loss(net_output, ...)        # (*) Build loss.
optimizer = build_optimizer(...) # (*) Constructs an optimizer.
The train_op = optimizer.minimize(loss, ...) # (*) applies the optimizer to optimize the loss.
# ...
Use the estimator = tf.estimator.Estimator(model_fn, warm_start_from = save_path, ...) # (*) to construct the estimator and use the warm_start_from parameter to load the model weight after decomposition.
# ...

Parent topic: AMCT (TensorFlow)