Layers That Support Quantization and Restrictions

This section describes the quantizable layers of different frameworks and related restrictions.

  • If the input data type or weight data type of the network model is float16 or mixed precision of float32 and float16, the AMCT disables quantization of the following operators:

    AvgPool, Pooling, AvgPoolV2, MaxPool, MaxPoolV3, Pooling, Add, Eltwise, and BatchMatMulV2 (Both inputs are tensors.).

  • Due to hardware restrictions, you are not advised to perform NUQ in this version. Otherwise, performance benefits cannot be obtained.
Table 1 Layers that support uniform quantization as well as their restrictions

Framework

Supported Layer Type

Restriction

Ascend IR–defined Layer

Caffe

InnerProduct

transpose = false, axis = 1

FullyConnection

Convolution

4 x 4 filter

Conv2D

Deconvolution

1-dilated 4 x 4 filter

Deconvolution

Pooling

  • If mode is set to 1, indicating full quantization (weight+tensor), and global_pooling is set to false, the N shift operation is not supported.
  • If mode is set to 0, only tensor quantization is performed.

Pooling

Eltwise

Only tensor quantization is performed and operation=1 is required.

Eltwise

TensorFlow

MatMul

  • transpose_a = False, transpose_b = False, adjoint_a = False, and adjoint_b = False
  • The weights do not have dynamic inputs (such as placeholders).

MatMulV2

Conv2D

The weights do not have dynamic inputs (such as placeholders).

Conv2D

DepthwiseConv2dNative

The weights do not have dynamic inputs (such as placeholders).

DepthwiseConv2D

Conv2DBackpropInput

If dilation is set to 1, the weights do not have dynamic inputs (such as placeholders).

Conv2DBackpropInput

BatchMatMulV2

  • adj_x=False
  • When the second input is a constant, only two dimensions are supported.
  • When both inputs are tensors, only INT8 symmetric quantization is supported.

BatchMatMulV2

AvgPool

The N shift operation is not supported.

AvgPool

Conv3D

dilation_d = 1

Conv3D

MaxPool

Tensor quantization only

MaxPool, MaxPoolV3

Add

Only tensor quantization is performed, and only single-input quantization is supported.

Add

ONNX

Conv

  • dilation_d = 1, filter = 5 x 5
  • The weights do not have dynamic inputs (such as placeholders).

Conv2D, Conv3D

Gemm

  • transpose_a=false
  • The weights do not have dynamic inputs (such as placeholders).

MatMulV2

ConvTranspose

  • 1-dilated 4 x 4 filter
  • The weights do not have dynamic inputs (such as placeholders).

Conv2DTranspose

MatMul

  • When the second input is a constant, only two dimensions are supported.
  • When both inputs are tensors, only INT8 symmetric quantization is supported.

  • The weights do not have dynamic inputs (such as placeholders).

BatchMatMulV2

AveragePool

If global_pooling is set to false, the N shift operation is not supported.

AvgPoolV2

MaxPool

Tensor quantization only

MaxPool, MaxPoolV3

Add

Only tensor quantization is performed, and only single-input quantization is supported.

Add

Table 2 Layers that support non-uniform quantization (NUQ) as well as their restrictions

Framework

Supported Layer Type

Restriction

Ascend IR–defined Layer

Caffe

Convolution

1-dilated 4 x 4 filter

Conv2D

InnerProduct

transpose = false, axis = 1

FullyConnection

TensorFlow

Conv2D

dilation = 1

Conv2D

MatMul

transpose_a = false

MatMulV2

ONNX

Conv

-

Conv2D

Gemm

transpose_a = false

MatMulV2

Table 3 Layers supported only in weight-only quantization scenarios as well as their restrictions

Ascend IR–defined Layer

Weight-only Quantization,

channel_wise=true in Weight ARQ

Weight-only Quantization,

asymmetric in Weight ARQ

Weight and Activation Quantization,

channel_wise=true in Weight ARQ

Weight and Activation Quantization,

asymmetric=true in Weight ARQ

Restrictions

MatMulV2

true

×

×

The second inputs do not have dynamic inputs (such as placeholders).

BatchMatMulV2

true

×

×

The second inputs do not have dynamic inputs (such as placeholders).

FFN

true and false

×

×

  • The input expert_tokens of the FFN operator is not empty.
  • The two weights of the FFN operator are constants of float16.
  • The antiquant_scale1, antiquant_scale2, antiquant_offset1, and antiquant_offset2 inputs of the FFN operator are empty.
  • The weight cannot be shared.

Notes:

  • : Supported. ×: Quantization is abnormal.
  • channel_wise=true in Weight ARQ: Channels are separately quantized using different quantization factors.
  • asymmetric in Weight ARQ
    • true: Asymmetric weight quantization is used.
    • false: Symmetric weight quantization is used.
    • true and false: Both symmetric weight quantization and asymmetric weight quantization are supported.