Fusion Support

Currently, the following layer fusion types are supported (operators involved in the following scenarios must meet the restrictions of each compression scenario, such as restrictions described in Quantization and Compression Combination):

Conv+BN fusion:
- Before AMCT quantization, "Conv+BN" fusion is performed on the "Conv2D/Conv3D+BatchNorm" composite in the model, during which the BatchNorm layer is removed.
- Before quantization, AMCT changes the "Conv2D+BatchToSpace+BN" composite in the model to "Conv2D+BN+BatchToSpace" if the channels are consistent and then performs Conv+BN fusion, during which the BatchNorm layer is removed.
Depthwise_Conv+BN fusion: Before AMCT quantization, "Depthwise_Conv+BN" fusion is performed on the "DepthwiseConv2dNative+BatchNorm" composite in the model. The BatchNorm layer is removed.
OP+(BiasAdd)+Mul fusion: Before AMCT quantization, "OP+(BiasAdd)+Mul" fusion is performed on the "Conv2D/Conv3D/MatMul/DepthwiseConv2dNative/Conv2DBackpropInput+Mul" and "Conv2D/MatMul/DepthwiseConv2d/Conv2DBackpropInput+BiasAdd+Mul" composites in the model, during which the Mul layer is removed.
In this scenario, the other input of Mul must be of the Const type with an empty shape.
Group_conv+BN fusion: If the "Split+Multi-Conv2D/Conv3D+ConcatV2 (or Concat, with concatenation performed along the C dimension)" composite is used in the model to indicate Group_conv, AMCT fuses the "Group_conv+BatchNorm" composite before quantization. The BatchNorm layer is removed.
BN fusion applies to the following operators: FusedBatchNorm, FusedBatchNormV2, and FusedBatchNormV3.
Requant fusion: Relu6 does not support Requant fusion and needs to be replaced with Relu. The Relu6 operator is identical to Relu except an upper cutoff at 6 that is similar to the clipping in floating-point quantization. As such, Relu+AscendQuant, when equivalent, can replace Relu6+AscendQuant. Against this background, AMCT replaces Relu6+AscendQuant in the quantized deployable model, where appropriate, with Relu+AscendQuant.
OP+Add fusion: If one of the Add or AddV2 operators is a constant and cannot be associated with the placeholder node, and the other operator is of the following types, the Add/AddV2 operator will be replaced with the BiasAdd operator for subsequent Bias quantization:
- Conv2d/Conv3d: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
- DepthwiseConv2dNative: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
- Conv2DBackpropInput: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
- MatMul: The length of the Add is aligned with that of the cout axis of Matmul. The length of other axes of Add is 1.
- BatchMatMul: Add is a constant, or a one-dimensional array, or an array whose last axis length is aligned with that of the cout axis with other axes length being 1.
MatMul+BN fusion:
- The output shape of the Reshape operator must be the same as that of the MatMul operator, as shown in Figure 1. The MatMul operator must meet any of the following conditions:
  - No biasadd, NHWC format, dynamic shape
  - No biasadd, NCHW format, dynamic shape
  - No biasadd, NHWC format, static shape
  - No biasadd, NCHW format, static shape
  - With biasadd, NHWC format, dynamic shape
  - With biasadd, NCHW format, dynamic shape
  - With biasadd, NHWC format, static shape
  - With biasadd, NCHW format, static shape
  Figure 1 MatMul+BN fusion diagram
- BN fusion applies to the following operators: FusedBatchNorm, FusedBatchNormV2, and FusedBatchNormV3.
Fusion of small BN operators into a large FusedBatchNormV3 operator: applies only to PTQ and the Conv+BN or Conv+BiasAdd+BN composite. For TensorFlow 1.15.0, such fusion takes place only on small BN operators that take 4D inputs, while for TensorFlow 2.6.3, BN operator inputs can be 4D or 5D (5D inputs entail Conv3D).
AMCT analyzes the composite of the small BN operators generated by tf.keras.layers.BatchNormalization, and replaces the small BN operators with a larger BN composite on the following conditions:
- On tf.keras.layers.BatchNormalization with fused=False, inputs, and training=False, the network structures before and after fusion are as follows.
- On tf.keras.layers.BatchNormalization with fused=False, center=False, inputs, and training=False, the network structures before and after fusion are as follows.
- On tf.keras.layers.BatchNormalization with fused=False, scale=False, inputs, and training=False, the network structures before and after fusion are as follows.
- On tf.keras.layers.BatchNormalization with fused=False, scale=False, center=False, inputs, and training=False, the network structures before and after fusion are as follows.

Parent topic: Reference