Fusion Support

Currently, the following layer fusion types are supported (operators involved in the following must meet the restrictions of each compression technique, such as Quantization and Compression Combination):

  • Conv+BN fusion:
    • Conv+BN fusion: Before AMCT-based quantization, the "Conv2D/Conv3D+BatchNorm" composite in the model is fused into "Conv+BN". The BatchNorm layer is removed.
    • Before quantization, AMCT changes the "Conv2D+BatchToSpace+BN" composite in the model to "Conv2D+BN+BatchToSpace" if the channels are consistent and then performs Conv+BN fusion, during which the BatchNorm layer is removed.
  • Depthwise_Conv+BN fusion: Before AMCT quantization, the "DepthwiseConv2dNative+BatchNorm" composite in the model is fused, during which the BatchNorm layer is removed.
  • OP+(BiasAdd)+Mul fusion: Before AMCT quantization, the "Conv2D/MatMul/DepthwiseConv2dNative/Conv2DBackpropInput+Mul" and "Conv2D/MatMul/DepthwiseConv2d/Conv2DBackpropInput+BiasAdd+Mul" composites in the model are fused into "OP+(BiasAdd)+Mul", during which the Mul layer is removed.

    In this scenario, the other input of Mul must be of the Const type with an empty shape.

  • Group_conv+BN fusion: If "Split+multi-channel Conv2D/Conv3D+ ConcatV2 (or Concat, with Concat on the C axis) "is used in the model, the "Group_conv+BatchNorm&quot in the model is modified before quantization. After the structures are fused, the BatchNorm layer is deleted.

    BN fusion applies to the following operators: FusedBatchNorm, FusedBatchNormV2, and FusedBatchNormV3.

  • Requant fusion (ReLU6 does not support Requant fusion and needs to be replaced with ReLU.) The Relu6 operator truncates values greater than 6 on the basis of Relu. In addition, the input floating-point number is truncated during quantization. Therefore, Relu6+Ascendquant can be replaced with Relu+Ascendquant in equivalent scenarios. Against this background, AMCT replaces Relu6+AscendQuant in the quantized deployable model, where appropriate, with Relu+AscendQuant.
  • OP+Add fusion: If one of the Add or AddV2 operators is a constant and cannot be associated with the placeholder node, and the other operator is of the following types, the Add/AddV2 operator will be replaced with the BiasAdd operator for subsequent Bias quantization:
    • Conv2d/Conv3d: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
    • DepthwiseConv2dNative: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
    • Conv2DBackpropInput: Add is a scalar, or a multidimensional array, or a one-dimensional array whose last axis length is aligned with that of the cout axis.
    • MatMul: The length of the Add is aligned with that of the cout axis of Matmul. The length of other axes of Add is 1.
    • BatchMatMul: Add is a constant, or a one-dimensional array, or an array whose last axis length is aligned with that of the cout axis with other axes length being 1.
  • MatMul+BN fusion:
    • The output shape of the Reshape operator must be the same as that of the MatMul operator, as shown in Figure 1. The MatMul operator must meet any of the following conditions:
      • No biasadd, NHWC format, dynamic shape
      • No biasadd, NCHW format, dynamic shape
      • No biasadd, NHWC format, static shape
      • No biasadd, NCHW format, static shape
      • With biasadd, NHWC format, dynamic shape
      • With biasadd, NCHW format, dynamic shape
      • With biasadd, NHWC format, static shape
      • With biasadd, NCHW format, static shape
      Figure 1 MatMul+BN fusion diagram
    • BN fusion applies to the following operators: FusedBatchNorm, FusedBatchNormV2, and FusedBatchNormV3.
  • Small BN operators are fused into large FusedBatchNormV3 operators. Only PTQ is supported. Only the Conv+BN or Conv+BiasAdd+BN structure can trigger the fusion of small BN operators into large operators. If the TensorFlow version is 1.15.0, only the scenario where the input of small BN operators is 4-dimensional is supported. If the TensorFlow version is 2.6.3, the BN small operator input can be 4-dimensional or 5-dimensional. In the 5-dimensional scenario, the Conv operator in the preceding structure must be Conv3D.
    AMCT analyzes the composite of the small BN operators generated by tf.keras.layers.BatchNormalization, and replaces the small BN operators with larger BN composite on the following conditions:
    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, center=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, scale=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, scale=False, center=False, inputs, and training=False. The network structures before and after fusion are as follows: