AscendQuant

Function Usage

Performs quantization by element. For example, quantize the half/float data type to the int8_t data type. The calculation formulas are as follows, where PAR indicates the number of elements that can be processed by the Vector Unit in one iteration, and round indicates the Banker's rounding.

  • Per_tensor quantization: src_local corresponds to a quantization parameter whose shape is (1,).

  • Per_channel quantization: The shape of src_local is (m, n), and each channel dimension corresponds to a quantization parameter whose shape is (n,).

Principles

Figure 1 AscendQuant algorithm block diagram, where both scale and offset are scalars
Figure 2 AscendQuant algorithm block diagram, where both scale and offset are tensors
Figure 3 AscendQuant algorithm block diagram, where scale is a tensor and offset is a scalar

The preceding figure shows the block diagram of the AscendQuant internal algorithm. The computation process is divided into the following steps, all of which are performed on vectors:

  1. Precision conversion: If the input src, scale, or offset is of the float type, convert it to the half type.
  2. Broadcast: If the input scale or offset is a vector, broadcast it to the same dimension as src.
  3. Scale calculation: If src and scale are vectors, perform Mul calculation. If scale is a scalar, perform Muls calculation to obtain Tmp1.
  4. Offset calculation: If Tmp1 and offset are vectors, perform Add calculation. If offset is a scalar, perform Adds calculation to obtain Tmp2.
  5. Precision conversion: Convert Tmp2 from half to int8_t to obtain the output.

Prototype

  • dstTensor of the int8_t type
    • Per_tensor quantization:
      • Pass the temporary space through the sharedTmpBuffer input parameter.
        • All or part of the source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset, const uint32_t calCount)
          
        • All source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset)
          
      • Allocate the temporary space through the API framework.
        • All or part of the source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset, const uint32_t calCount)
          
        • All source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset)
          
    • Per_channel quantization:
      • Pass the temporary space through the sharedTmpBuffer input parameter.
        • All or part of the source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
          
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
          
        • All source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset)
          
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
          
      • Allocate the temporary space through the API framework.
        • All or part of the source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
          
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
          
        • All source operand tensors are involved in computation.
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset)
          
          1
          2
          template <typename T, bool isReuseSource = false>
          __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
          

Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

  • When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.
  • When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendQuantMaxMinTmpSize API provided in GetAscendQuantMaxMinTmpSize.

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the operand.

isReuseSource

Whether the source operand can be modified. This parameter is reserved. Pass the default value false.

Table 2 Per_tensor API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For the Atlas Training Series Product , the supported data type is int8_t.

srcTensor

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For the Atlas Training Series Product , the supported data types are half and float.

sharedTmpBuffer

Input

Temporary buffer.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize.

For the Atlas Training Series Product , the supported data type is uint8_t.

scale

Input

Quantization parameter.

The type is Scalar, and the supported data type is float.

offset

Input

Quantization parameter.

The type is Scalar, and the supported data type is float.

calCount

Input

Number of actually computed data elements. The value range is [0, srcTensor.GetSize()].

Table 3 Per_channel API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

srcTensor

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

sharedTmpBuffer

Input

Temporary buffer.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize.

scaleTensor

Input

Quantization parameter.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

offsetTensor

Input

Quantization parameter.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

scaleCount

Input

Number of actual quantization parameter elements. The value is within a range of [0, scaleTensor.GetSize()] and must be an integer multiple of 32.

offsetCount

Input

Number of actual quantization parameter elements. The value is within a range of [0, offsetTensor.GetSize()] and must be an integer multiple of 32. It is the same as that of scaleCount.

calCount

Input

Number of actually computed data elements. The value is within a range of [0, srcTensor.GetSize()] and must be an integer multiple of scaleCount.

Returns

None

Availability

The Atlas Training Series Product supports only per_tensor quantization and does not support per_channel quantization.

Constraints

  • The source operand and destination operand can be used simultaneously (that is, address overlapping).
  • For details about the alignment requirements of the operand address offset, see General Restrictions.
  • The length of the data involved in computation of the input and output operands must be 32-byte aligned.
  • When scale is of the float type, its value range is still the range of values of the half type.

Example

For a complete operator example, see quant operator sample.

1
2
3
4
//The input shape is 1024.
uint32_t dataSize = 1024; 
//The input type is float or half, with scale being 2.0 and offset being 0.9. Temporary space is reserved.
AscendC::AscendQuant<srcType>(dstLocal, srcLocal, 2.0f, 0.9f, dataSize);

Result example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Input data (srcLocal):
[-3.22      2.09     -2.025    -2.895    -1.349    -3.336     1.376
  2.453     3.861     1.085    -2.273     0.3923    0.3645   -2.127
 -3.09     -0.002726 -2.783     0.2615   -0.904     1.507    -1.017
  3.568     2.219     0.8643    0.922     1.144    -1.853     2.002
 -1.705     1.675    -3.482     1.519     0.4172    0.4307   -1.228
 -2.62      0.3354   -3.586     2.604     1.688    -3.646    -3.389
 -3.918     3.955     0.7954   -2.562    -1.085     2.91     -0.398
  3.771    -2.914     1.726     3.367     3.482     3.49      1.382
  3.512     0.1938   -0.4087   -3.75      2.873    -2.54      1.826
  3.738     3.188     2.676     0.724    -1.108    -2.682    -0.4783
  2.082    -0.462    -2.955    -2.543     3.98     -1.85      3.018
 -2.688     3.596    -0.799     1.222     1.686    -0.7925    3.295
 -3.568    -0.03836  -2.002    -1.212     1.927    -1.11      1.046
  3.793    -0.6226   -3.494    -3.371    -2.354    -1.7      -0.948
  2.682    -3.344     2.566     2.533    -1.335     1.405     3.867
  3.674     1.359     3.145    -1.22      1.054    -2.492    -1.214
  3.879     2.014     2.664    -2.863    -3.88      2.857     1.695
  2.852     2.893     2.367    -0.1832   -3.254    -1.49      1.13
  0.672    -1.863    -3.547     3.281    -1.573    -1.349    -3.547
 -3.766    -2.99     -3.203    -2.703    -2.793    -1.501     0.4785
 -1.216    -1.205     0.9097   -3.438     0.781    -1.505    -1.982
  0.2037    0.4595    0.759     0.844    -3.396     0.4778   -0.899
 -2.342    -0.961    -2.531    -0.10913  -3.516    -3.66      1.337
 -3.44      0.7495    1.958     2.775     0.0968   -3.       -2.13
 -1.818     2.664     2.066    -1.923     2.97     -2.047    -3.598
  0.1661   -0.179     3.186    -1.247     2.777    -3.344    -3.148
  2.275     2.916    -1.081    -3.213     2.87     -3.12     -3.066
 -0.6      -3.78     -3.012    -3.86     -0.707    -0.2203   -3.338
 -2.273     2.062    -2.422    -0.443    -1.333    -2.2      -1.478
 -2.816     1.134     0.2115   -2.459     3.842    -2.768     2.822
  1.3125   -2.143     1.971    -3.543    -0.07794  -0.1265    0.763
 -3.26      3.514     3.629     0.1902    1.277    -0.1652   -0.006435
 -1.25      2.258    -2.887     3.66      2.729    -3.27     -0.5615
 -3.176    -1.2295    1.556    -0.6626   -2.777     1.946    -0.338
 -2.977    -0.8135   -2.37      0.7764    3.525    -0.6196    2.436
  2.38     -1.708     0.814     0.4688   -1.255     1.04     -1.077
  3.176     1.859     0.9194    2.703     1.436     1.762     2.2
  1.794    -1.234    -2.148    -2.393     2.846     1.854     0.3428
 -2.379     0.2429   -1.561     2.582     0.6836    1.811    -2.53
 -3.951    -2.096    -2.639     2.02      2.799    -0.8936   -1.295
 -3.914    -1.82      2.541    -2.773     1.733     3.955    -3.092
  0.04095   0.82     -1.071     3.93     -3.158    -2.5      -0.5415
 -1.98     -0.1626    3.092    -1.3125    3.387    -2.496     2.355
 -3.033    -3.814    -3.191     2.686     1.377     1.381    -3.047
  2.127    -0.4927   -1.718     2.371    -0.1648    1.885    -0.6826
 -3.121    -2.379    -3.959    -2.164     2.262    -2.973     3.092
  2.111    -0.03732   2.836    -2.725     3.436     1.017     2.877
 -2.926     2.547     0.8574    2.643     2.646    -0.889     3.363
 -0.3147   -0.09546   0.0551   -3.947    -1.434    -0.6104   -3.41
 -2.176    -1.866     3.975    -3.031    -1.25      3.918     3.697
  3.21     -2.436    -3.281    -3.225     0.7856    2.043     1.415
 -2.252    -1.648     0.03824  -3.432     0.3271    1.458    -0.02289
 -0.643     1.441    -0.1847    1.062     3.545     0.367     1.796
 -1.687     2.06      0.2373    3.748    -2.752     2.73     -2.693
 -3.54     -2.275    -3.033    -1.622    -3.936     1.295     2.586
 -2.926    -2.314     2.527    -1.619    -0.04037  -3.225     1.771
  3.064    -1.173    -2.324     3.332    -0.8257    1.075    -3.287
  1.075    -2.262     1.419    -0.344    -0.4988    1.113     3.068
 -1.104     2.531     2.645     0.6333    0.3677   -3.186    -0.3726
  2.549    -0.3347    2.227    -3.963    -2.564     3.656     1.069
 -3.684    -1.388    -0.2568   -0.726     0.4883    1.946    -1.579
 -0.8438   -2.014     2.332     0.306    -3.305    -3.588    -1.038
  3.299     0.832     0.8594   -1.163     1.2705    2.018    -3.352
  2.537     2.111    -3.61      0.645    -2.459    -2.469     1.002
 -3.914     1.079    -0.9214   -2.111    -3.88     -0.5254   -1.908
 -1.19      3.559    -3.285    -2.266     3.672     0.001524 -1.964
 -1.742     1.895     3.887     1.737     0.909     0.5044    2.55
  0.8936    2.139    -3.658     1.828    -3.688    -3.26      1.436
 -1.321    -3.19      2.764    -3.305    -2.52     -2.441    -0.32
 -2.402     2.252    -1.527     0.719     0.2328    0.1766   -2.088
  3.729     0.844    -1.174    -0.7427    0.8296   -0.1885   -0.0379
  2.92      2.502     3.846     1.657    -3.58     -3.352    -3.904
 -2.43      1.159    -1.707     2.21      2.367    -0.5864   -1.647
  1.952   ]
Output data (dstLocal):
[-6  5 -3 -5 -2 -6  4  6  9  3 -4  2  2 -3 -5  1 -5  1 -1  4 -1  8  5  3
  3  3 -3  5 -3  4 -6  4  2  2 -2 -4  2 -6  6  4 -6 -6 -7  9  2 -4 -1  7
  0  8 -5  4  8  8  8  4  8  1  0 -7  7 -4  5  8  7  6  2 -1 -4  0  5  0
 -5 -4  9 -3  7 -4  8 -1  3  4 -1  7 -6  1 -3 -2  5 -1  3  8  0 -6 -6 -4
 -2 -1  6 -6  6  6 -2  4  9  8  4  7 -2  3 -4 -2  9  5  6 -5 -7  7  4  7
  7  6  1 -6 -2  3  2 -3 -6  7 -2 -2 -6 -7 -5 -6 -5 -5 -2  2 -2 -2  3 -6
  2 -2 -3  1  2  2  3 -6  2 -1 -4 -1 -4  1 -6 -6  4 -6  2  5  6  1 -5 -3
 -3  6  5 -3  7 -3 -6  1  1  7 -2  6 -6 -5  5  7 -1 -6  7 -5 -5  0 -7 -5
 -7 -1  0 -6 -4  5 -4  0 -2 -3 -2 -5  3  1 -4  9 -5  7  4 -3  5 -6  1  1
  2 -6  8  8  1  3  1  1 -2  5 -5  8  6 -6  0 -5 -2  4  0 -5  5  0 -5 -1
 -4  2  8  0  6  6 -3  3  2 -2  3 -1  7  5  3  6  4  4  5  4 -2 -3 -4  7
  5  2 -4  1 -2  6  2  5 -4 -7 -3 -4  5  6 -1 -2 -7 -3  6 -5  4  9 -5  1
  3 -1  9 -5 -4  0 -3  1  7 -2  8 -4  6 -5 -7 -5  6  4  4 -5  5  0 -3  6
  1  5  0 -5 -4 -7 -3  5 -5  7  5  1  7 -5  8  3  7 -5  6  3  6  6 -1  8
  0  1  1 -7 -2  0 -6 -3 -3  9 -5 -2  9  8  7 -4 -6 -6  2  5  4 -4 -2  1
 -6  2  4  1  0  4  1  3  8  2  4 -2  5  1  8 -5  6 -4 -6 -4 -5 -2 -7  3
  6 -5 -4  6 -2  1 -6  4  7 -1 -4  8 -1  3 -6  3 -4  4  0  0  3  7 -1  6
  6  2  2 -5  0  6  0  5 -7 -4  8  3 -6 -2  0 -1  2  5 -2 -1 -3  6  2 -6
 -6 -1  7  3  3 -1  3  5 -6  6  5 -6  2 -4 -4  3 -7  3 -1 -3 -7  0 -3 -1
  8 -6 -4  8  1 -3 -3  5  9  4  3  2  6  3  5 -6  5 -6 -6  4 -2 -5  6 -6
 -4 -4  0 -4  5 -2  2  1  1 -3  8  3 -1 -1  3  1  1  7  6  9  4 -6 -6 -7
 -4  3 -3  5  6  0 -2  5]