AscendQuant
Function Usage
Performs quantization by element. For example, quantize the half/float data type to the int8_t data type. The calculation formulas are as follows, where PAR indicates the number of elements that can be processed by the Vector Unit in one iteration, and round indicates the Banker's rounding.
- Per_tensor quantization: src_local corresponds to a quantization parameter whose shape is (1,).

- Per_channel quantization: The shape of src_local is (m, n), and each channel dimension corresponds to a quantization parameter whose shape is (n,).

Principles
The preceding figure shows the block diagram of the AscendQuant internal algorithm. The computation process is divided into the following steps, all of which are performed on vectors:
- Precision conversion: If the input src, scale, or offset is of the float type, convert it to the half type.
- Broadcast: If the input scale or offset is a vector, broadcast it to the same dimension as src.
- Scale calculation: If src and scale are vectors, perform Mul calculation. If scale is a scalar, perform Muls calculation to obtain Tmp1.
- Offset calculation: If Tmp1 and offset are vectors, perform Add calculation. If offset is a scalar, perform Adds calculation to obtain Tmp2.
- Precision conversion: Convert Tmp2 from half to int8_t to obtain the output.
Prototype
- dstTensor of the int8_t type
- Per_tensor quantization:
- Pass the temporary space through the sharedTmpBuffer input parameter.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset)
- All or part of the source operand tensors are involved in computation.
- Allocate the temporary space through the API framework.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset)
- All or part of the source operand tensors are involved in computation.
- Pass the temporary space through the sharedTmpBuffer input parameter.
- Per_channel quantization:
- Pass the temporary space through the sharedTmpBuffer input parameter.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset)
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
- All or part of the source operand tensors are involved in computation.
- Allocate the temporary space through the API framework.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset)
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
- All or part of the source operand tensors are involved in computation.
- Pass the temporary space through the sharedTmpBuffer input parameter.
- Per_tensor quantization:
Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.
- When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.
- When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.
If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendQuantMaxMinTmpSize API provided in GetAscendQuantMaxMinTmpSize.
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Data type of the operand. |
|
isReuseSource |
Whether the source operand can be modified. This parameter is reserved. Pass the default value false. |
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For the |
|
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For the |
|
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. For the |
|
scale |
Input |
Quantization parameter. The type is Scalar, and the supported data type is float. |
|
offset |
Input |
Quantization parameter. The type is Scalar, and the supported data type is float. |
|
calCount |
Input |
Number of actually computed data elements. The value range is [0, srcTensor.GetSize()]. |
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. |
|
scaleTensor |
Input |
Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
offsetTensor |
Input |
Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
scaleCount |
Input |
Number of actual quantization parameter elements. The value is within a range of [0, scaleTensor.GetSize()] and must be an integer multiple of 32. |
|
offsetCount |
Input |
Number of actual quantization parameter elements. The value is within a range of [0, offsetTensor.GetSize()] and must be an integer multiple of 32. It is the same as that of scaleCount. |
|
calCount |
Input |
Number of actually computed data elements. The value is within a range of [0, srcTensor.GetSize()] and must be an integer multiple of scaleCount. |
Returns
None
Availability
The
Constraints
- The source operand and destination operand can be used simultaneously (that is, address overlapping).
- For details about the alignment requirements of the operand address offset, see General Restrictions.
- The length of the data involved in computation of the input and output operands must be 32-byte aligned.
- When scale is of the float type, its value range is still the range of values of the half type.
Example
For a complete operator example, see quant operator sample.
1 2 3 4 |
//The input shape is 1024. uint32_t dataSize = 1024; //The input type is float or half, with scale being 2.0 and offset being 0.9. Temporary space is reserved. AscendC::AscendQuant<srcType>(dstLocal, srcLocal, 2.0f, 0.9f, dataSize); |
Result example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
Input data (srcLocal): [-3.22 2.09 -2.025 -2.895 -1.349 -3.336 1.376 2.453 3.861 1.085 -2.273 0.3923 0.3645 -2.127 -3.09 -0.002726 -2.783 0.2615 -0.904 1.507 -1.017 3.568 2.219 0.8643 0.922 1.144 -1.853 2.002 -1.705 1.675 -3.482 1.519 0.4172 0.4307 -1.228 -2.62 0.3354 -3.586 2.604 1.688 -3.646 -3.389 -3.918 3.955 0.7954 -2.562 -1.085 2.91 -0.398 3.771 -2.914 1.726 3.367 3.482 3.49 1.382 3.512 0.1938 -0.4087 -3.75 2.873 -2.54 1.826 3.738 3.188 2.676 0.724 -1.108 -2.682 -0.4783 2.082 -0.462 -2.955 -2.543 3.98 -1.85 3.018 -2.688 3.596 -0.799 1.222 1.686 -0.7925 3.295 -3.568 -0.03836 -2.002 -1.212 1.927 -1.11 1.046 3.793 -0.6226 -3.494 -3.371 -2.354 -1.7 -0.948 2.682 -3.344 2.566 2.533 -1.335 1.405 3.867 3.674 1.359 3.145 -1.22 1.054 -2.492 -1.214 3.879 2.014 2.664 -2.863 -3.88 2.857 1.695 2.852 2.893 2.367 -0.1832 -3.254 -1.49 1.13 0.672 -1.863 -3.547 3.281 -1.573 -1.349 -3.547 -3.766 -2.99 -3.203 -2.703 -2.793 -1.501 0.4785 -1.216 -1.205 0.9097 -3.438 0.781 -1.505 -1.982 0.2037 0.4595 0.759 0.844 -3.396 0.4778 -0.899 -2.342 -0.961 -2.531 -0.10913 -3.516 -3.66 1.337 -3.44 0.7495 1.958 2.775 0.0968 -3. -2.13 -1.818 2.664 2.066 -1.923 2.97 -2.047 -3.598 0.1661 -0.179 3.186 -1.247 2.777 -3.344 -3.148 2.275 2.916 -1.081 -3.213 2.87 -3.12 -3.066 -0.6 -3.78 -3.012 -3.86 -0.707 -0.2203 -3.338 -2.273 2.062 -2.422 -0.443 -1.333 -2.2 -1.478 -2.816 1.134 0.2115 -2.459 3.842 -2.768 2.822 1.3125 -2.143 1.971 -3.543 -0.07794 -0.1265 0.763 -3.26 3.514 3.629 0.1902 1.277 -0.1652 -0.006435 -1.25 2.258 -2.887 3.66 2.729 -3.27 -0.5615 -3.176 -1.2295 1.556 -0.6626 -2.777 1.946 -0.338 -2.977 -0.8135 -2.37 0.7764 3.525 -0.6196 2.436 2.38 -1.708 0.814 0.4688 -1.255 1.04 -1.077 3.176 1.859 0.9194 2.703 1.436 1.762 2.2 1.794 -1.234 -2.148 -2.393 2.846 1.854 0.3428 -2.379 0.2429 -1.561 2.582 0.6836 1.811 -2.53 -3.951 -2.096 -2.639 2.02 2.799 -0.8936 -1.295 -3.914 -1.82 2.541 -2.773 1.733 3.955 -3.092 0.04095 0.82 -1.071 3.93 -3.158 -2.5 -0.5415 -1.98 -0.1626 3.092 -1.3125 3.387 -2.496 2.355 -3.033 -3.814 -3.191 2.686 1.377 1.381 -3.047 2.127 -0.4927 -1.718 2.371 -0.1648 1.885 -0.6826 -3.121 -2.379 -3.959 -2.164 2.262 -2.973 3.092 2.111 -0.03732 2.836 -2.725 3.436 1.017 2.877 -2.926 2.547 0.8574 2.643 2.646 -0.889 3.363 -0.3147 -0.09546 0.0551 -3.947 -1.434 -0.6104 -3.41 -2.176 -1.866 3.975 -3.031 -1.25 3.918 3.697 3.21 -2.436 -3.281 -3.225 0.7856 2.043 1.415 -2.252 -1.648 0.03824 -3.432 0.3271 1.458 -0.02289 -0.643 1.441 -0.1847 1.062 3.545 0.367 1.796 -1.687 2.06 0.2373 3.748 -2.752 2.73 -2.693 -3.54 -2.275 -3.033 -1.622 -3.936 1.295 2.586 -2.926 -2.314 2.527 -1.619 -0.04037 -3.225 1.771 3.064 -1.173 -2.324 3.332 -0.8257 1.075 -3.287 1.075 -2.262 1.419 -0.344 -0.4988 1.113 3.068 -1.104 2.531 2.645 0.6333 0.3677 -3.186 -0.3726 2.549 -0.3347 2.227 -3.963 -2.564 3.656 1.069 -3.684 -1.388 -0.2568 -0.726 0.4883 1.946 -1.579 -0.8438 -2.014 2.332 0.306 -3.305 -3.588 -1.038 3.299 0.832 0.8594 -1.163 1.2705 2.018 -3.352 2.537 2.111 -3.61 0.645 -2.459 -2.469 1.002 -3.914 1.079 -0.9214 -2.111 -3.88 -0.5254 -1.908 -1.19 3.559 -3.285 -2.266 3.672 0.001524 -1.964 -1.742 1.895 3.887 1.737 0.909 0.5044 2.55 0.8936 2.139 -3.658 1.828 -3.688 -3.26 1.436 -1.321 -3.19 2.764 -3.305 -2.52 -2.441 -0.32 -2.402 2.252 -1.527 0.719 0.2328 0.1766 -2.088 3.729 0.844 -1.174 -0.7427 0.8296 -0.1885 -0.0379 2.92 2.502 3.846 1.657 -3.58 -3.352 -3.904 -2.43 1.159 -1.707 2.21 2.367 -0.5864 -1.647 1.952 ] Output data (dstLocal): [-6 5 -3 -5 -2 -6 4 6 9 3 -4 2 2 -3 -5 1 -5 1 -1 4 -1 8 5 3 3 3 -3 5 -3 4 -6 4 2 2 -2 -4 2 -6 6 4 -6 -6 -7 9 2 -4 -1 7 0 8 -5 4 8 8 8 4 8 1 0 -7 7 -4 5 8 7 6 2 -1 -4 0 5 0 -5 -4 9 -3 7 -4 8 -1 3 4 -1 7 -6 1 -3 -2 5 -1 3 8 0 -6 -6 -4 -2 -1 6 -6 6 6 -2 4 9 8 4 7 -2 3 -4 -2 9 5 6 -5 -7 7 4 7 7 6 1 -6 -2 3 2 -3 -6 7 -2 -2 -6 -7 -5 -6 -5 -5 -2 2 -2 -2 3 -6 2 -2 -3 1 2 2 3 -6 2 -1 -4 -1 -4 1 -6 -6 4 -6 2 5 6 1 -5 -3 -3 6 5 -3 7 -3 -6 1 1 7 -2 6 -6 -5 5 7 -1 -6 7 -5 -5 0 -7 -5 -7 -1 0 -6 -4 5 -4 0 -2 -3 -2 -5 3 1 -4 9 -5 7 4 -3 5 -6 1 1 2 -6 8 8 1 3 1 1 -2 5 -5 8 6 -6 0 -5 -2 4 0 -5 5 0 -5 -1 -4 2 8 0 6 6 -3 3 2 -2 3 -1 7 5 3 6 4 4 5 4 -2 -3 -4 7 5 2 -4 1 -2 6 2 5 -4 -7 -3 -4 5 6 -1 -2 -7 -3 6 -5 4 9 -5 1 3 -1 9 -5 -4 0 -3 1 7 -2 8 -4 6 -5 -7 -5 6 4 4 -5 5 0 -3 6 1 5 0 -5 -4 -7 -3 5 -5 7 5 1 7 -5 8 3 7 -5 6 3 6 6 -1 8 0 1 1 -7 -2 0 -6 -3 -3 9 -5 -2 9 8 7 -4 -6 -6 2 5 4 -4 -2 1 -6 2 4 1 0 4 1 3 8 2 4 -2 5 1 8 -5 6 -4 -6 -4 -5 -2 -7 3 6 -5 -4 6 -2 1 -6 4 7 -1 -4 8 -1 3 -6 3 -4 4 0 0 3 7 -1 6 6 2 2 -5 0 6 0 5 -7 -4 8 3 -6 -2 0 -1 2 5 -2 -1 -3 6 2 -6 -6 -1 7 3 3 -1 3 5 -6 6 5 -6 2 -4 -4 3 -7 3 -1 -3 -7 0 -3 -1 8 -6 -4 8 1 -3 -3 5 9 4 3 2 6 3 5 -6 5 -6 -6 4 -2 -5 6 -6 -4 -4 0 -4 5 -2 2 1 1 -3 8 3 -1 -1 3 1 1 7 6 9 4 -6 -6 -7 -4 3 -3 5 6 0 -2 5] |