Layers That Support Quantization and Restrictions

This section describes the quantizable layers of different frameworks and related restrictions.

If the input data type or weight data type of the network model is float16 or mixed precision of float32 and float16, the AMCT disables quantization of the following operators:
AvgPool, Pooling, AvgPoolV2, MaxPool, MaxPoolV3, Pooling, Add, Eltwise, and BatchMatMulV2 (Both inputs are tensors.).

Due to hardware restrictions, you are not advised to perform NUQ in this version. Otherwise, performance benefits cannot be obtained.

**Table 1** Layers that support uniform quantization as well as their restrictions
Framework	Supported Layer Type	Restriction	Ascend IR–defined Layer
Caffe	InnerProduct	transpose = false, axis = 1	FullyConnection
	Convolution	4 x 4 filter	Conv2D
	Deconvolution	1-dilated 4 x 4 filter	Deconvolution
	Pooling	If mode is set to 1, indicating full quantization (weight+tensor), and global_pooling is set to false, the N shift operation is not supported. If mode is set to 0, only tensor quantization is performed.	Pooling
	Eltwise	Only tensor quantization is performed and operation=1 is required.	Eltwise
TensorFlow	MatMul	transpose_a = False, transpose_b = False, adjoint_a = False, and adjoint_b = False The weights do not have dynamic inputs (such as placeholders).	MatMulV2
	Conv2D	The weights do not have dynamic inputs (such as placeholders).	Conv2D
	DepthwiseConv2dNative	The weights do not have dynamic inputs (such as placeholders).	DepthwiseConv2D
	Conv2DBackpropInput	If dilation is set to 1, the weights do not have dynamic inputs (such as placeholders).	Conv2DBackpropInput
	BatchMatMulV2	adj_x=False When the second input is a constant, only two dimensions are supported. When both inputs are tensors, only INT8 symmetric quantization is supported.	BatchMatMulV2
	AvgPool	The N shift operation is not supported.	AvgPool
	Conv3D	dilation_d = 1	Conv3D
	MaxPool	Tensor quantization only	MaxPool, MaxPoolV3
	Add	Only tensor quantization is performed, and only single-input quantization is supported.	Add
ONNX	Conv	dilation_d = 1, filter = 5 x 5 The weights do not have dynamic inputs (such as placeholders).	Conv2D, Conv3D
	Gemm	transpose_a=false The weights do not have dynamic inputs (such as placeholders).	MatMulV2
	ConvTranspose	1-dilated 4 x 4 filter The weights do not have dynamic inputs (such as placeholders).	Conv2DTranspose
	MatMul	When the second input is a constant, only two dimensions are supported. When both inputs are tensors, only INT8 symmetric quantization is supported. The weights do not have dynamic inputs (such as placeholders).	BatchMatMulV2
	AveragePool	If global_pooling is set to false, the N shift operation is not supported.	AvgPoolV2
	MaxPool	Tensor quantization only	MaxPool, MaxPoolV3
	Add	Only tensor quantization is performed, and only single-input quantization is supported.	Add

**Table 2** Layers that support non-uniform quantization (NUQ) as well as their restrictions
Framework	Supported Layer Type	Restriction	Ascend IR–defined Layer
Caffe	Convolution	1-dilated 4 x 4 filter	Conv2D
Caffe	InnerProduct	transpose = false, axis = 1	FullyConnection
TensorFlow	Conv2D	dilation = 1	Conv2D
TensorFlow	MatMul	transpose_a = false	MatMulV2
ONNX	Conv	-	Conv2D
ONNX	Gemm	transpose_a = false	MatMulV2

**Table 3** Layers supported only in weight-only quantization scenarios as well as their restrictions
Ascend IR–defined Layer	Weight-only Quantization, channel_wise=true in Weight ARQ	Weight-only Quantization, asymmetric in Weight ARQ	Weight and Activation Quantization, channel_wise=true in Weight ARQ	Weight and Activation Quantization, asymmetric=true in Weight ARQ	Restrictions
MatMulV2	√	true	×	×	The second inputs do not have dynamic inputs (such as placeholders).
BatchMatMulV2	√	true	×	×	The second inputs do not have dynamic inputs (such as placeholders).
FFN	√	true and false	×	×	The input expert_tokens of the FFN operator is not empty. The two weights of the FFN operator are constants of float16. The antiquant_scale1, antiquant_scale2, antiquant_offset1, and antiquant_offset2 inputs of the FFN operator are empty. The weight cannot be shared.

Notes:

√: Supported. ×: Quantization is abnormal.
channel_wise=true in Weight ARQ: Channels are separately quantized using different quantization factors.
asymmetric in Weight ARQ
- true: Asymmetric weight quantization is used.
- false: Symmetric weight quantization is used.
- true and false: Both symmetric weight quantization and asymmetric weight quantization are supported.

Parent topic: Appendixes