Layers That Support Quantization and Their Restrictions
This section describes the quantizable layers of different frameworks and related restrictions.
- If the input data type or weight data type of the network model is float16 or mixed precision of float32 and float16, the quantization of the following operators is disabled.
AvgPool, Pooling, AvgPoolV2, MaxPool, MaxPoolV3, Pooling, Add, Eltwise, and BatchMatMulV2 (Both inputs are tensors.).
- Due to hardware restrictions, you are advised not to perform NUQ in this version. Otherwise, performance benefits are not generated.
- The following products do not support the Caffe framework:
Atlas A2 training products /Atlas A2 inference products Atlas A3 training products /Atlas A3 inference products
Framework |
Supported Layer Type |
Restriction |
Ascend IR–defined Layer |
|---|---|---|---|
Caffe |
Convolution |
1-dilated 4 × 4 filter |
Conv2D |
InnerProduct |
transpose = false, axis = 1 |
FullyConnection |
|
TensorFlow |
Conv2D |
dilation = 1 |
Conv2D |
MatMul |
transpose_a = false |
MatMulV2 |
|
ONNX |
Conv |
- |
Conv2D |
Gemm |
transpose_a = false |
MatMulV2 |
Weight-only quantization is supported only by the following Product types:
Ascend IR–defined Layer |
Weight Quantization Only, channel_wise=true in Weight ARQ |
Weight Quantization Only, asymmetric in Weight ARQ (true/false) |
Weight and Activation Quantization, channel_wise=true in Weight ARQ |
Weight and Activation Quantization, asymmetric=true in Weight ARQ |
Restriction |
|---|---|---|---|---|---|
MatMulV2 |
√ |
true |
× |
× |
The second inputs do not have dynamic inputs (such as placeholders). |
BatchMatMulV2 |
√ |
true |
× |
× |
The second inputs do not have dynamic inputs (such as placeholders). |
FFN |
√ |
true and false |
× |
× |
|
Notes:
- √: Supported. ×: Quantization is abnormal.
- channel_wise=true in Weight ARQ: Channels are separately quantized using different quantization factors.
- asymmetric in Weight ARQ
- true: Asymmetric weight quantization is used.
- false: Symmetric weight quantization is used.
- true and false: Both symmetric weight quantization and asymmetric weight quantization are supported.