Fusion Support

Currently, this tool mainly implements the following forms of fusion (single-operators involved in the following fusion forms must meet the restrictions of Quantization):

  • Conv+BN fusion: Before AMCT-based quantization, the "Conv2D/Conv3D+BatchNorm" composite in the model is fused into "Conv+BN". The BatchNorm layer is removed.
  • Depthwise_Conv+BN fusion: Before AMCT quantization, the "DepthwiseConv2dNative+BatchNorm" composite in the model is fused, during which the BatchNorm layer is removed.
  • OP+(BiasAdd)+Mul fusion: Before AMCT quantization, the "Conv2D/MatMul/DepthwiseConv2dNative/Conv2DBackpropInput+Mul" and "Conv2D/MatMul/DepthwiseConv2d/Conv2DBackpropInput+BiasAdd+Mul" composites in the model are fused into "OP+(BiasAdd)+Mul", during which the Mul layer is removed.

    In this scenario, the other input of Mul must be of the Const type with an empty shape.

  • Group_conv+BN fusion: If "Split+multi-channel Conv2D+ ConcatV2 (or Concat, with Concat on the C axis) "is used in the model, the "Group_conv+BatchNorm&quot in the model is modified before quantization. After the structures are fused, the BatchNorm layer is deleted.

    BN fusion applies to the following operators: FusedBatchNorm, FusedBatchNormV2, and FusedBatchNormV3.

  • Fusion of small BN operators into FusedBatchNormV3: applicable only to PTQ. Only the "Conv+BN" or "Conv+BiasAdd+BN" composite triggers such fusion and small BN operators must take 4D inputs.
    AMCT analyzes the composite of the small BN operators generated by tf.keras.layers.BatchNormalization, and replaces the small BN operators with larger BN composite on the following conditions:
    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, center=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, scale=False, inputs, and training=False. The network structures before and after fusion are as follows:

    • The tf.keras.layers.BatchNormalization interface meets the following requirements: fused=False, scale=False, center=False, inputs, and training=False. The network structures before and after fusion are as follows: