AscendQuant
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
x |
|
|
√ |
|
|
x |
|
|
√ |
Function
Performs quantization by element. For example, quantize the half/float data type to the int8_t data type. The following is the formula, where round indicates rounding to the nearest even number.
- PER_TENSOR quantization: srcTensor corresponds to a quantization parameter, whose shape is [1].

- PER_CHANNEL quantization: The shape of srcTensor is [m, n]. Each channel dimension corresponds to a quantization parameter, whose shape is [n].

Principles
The preceding figure shows the block diagram of the AscendQuant internal algorithm. The computation process is divided into the following steps, all of which are performed on vectors:
- Precision conversion: If the input src, scale, or offset is of the float type, convert it to the half type.
- Broadcast: If the input scale or offset is a vector, broadcast it to the same dimension as src.
- Scale calculation: If src and scale are vectors, Mul calculation is performed. If scale is a scalar, Muls calculation is performed to obtain Tmp1.
- Offset calculation: If Tmp1 and offset are vectors, Add calculation is performed. If offset is a scalar, Adds calculation is performed to obtain Tmp2.
- Precision conversion: Convert Tmp2 from half to int8_t to obtain the output.
Prototype
- dstTensor of int8_t type
- PER_TENSOR quantization:
- Pass to the temporary space through the sharedTmpBuffer input parameter.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset)
- All or part of the source operand tensors are involved in computation.
- Allocate the temporary space through the API framework.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset)
- All or part of the source operand tensors are involved in computation.
- Pass to the temporary space through the sharedTmpBuffer input parameter.
- PER_CHANNEL quantization:
- Pass to the temporary space through the sharedTmpBuffer input parameter.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset)
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
- All or part of the source operand tensors are involved in computation.
- Allocate the temporary space through the API framework.
- All or part of the source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)
- All source operand tensors are involved in computation.
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset)
1 2
template <typename T, bool isReuseSource = false, const AscendQuantConfig& config = ASCEND_QUANT_DEFAULT_CFG> __aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)
- All or part of the source operand tensors are involved in computation.
- Pass to the temporary space through the sharedTmpBuffer input parameter.
- PER_TENSOR quantization:
Due to the complex mathematical computation involved in the internal implementation of this API, extra temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.
- When the API framework is used for temporary space allocation, you do not need to allocate the space, but must reserve the required size for the temporary space.
- When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables you to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.
If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendQuantMaxMinTmpSize API provided in GetAscendQuantMaxMinTmpSize.
Parameters
|
Parameter |
Description |
||||
|---|---|---|---|---|---|
|
T |
Data type of the operand. For the For the For the For the |
||||
|
isReuseSource |
Whether the source operand can be modified. This parameter is reserved. Pass the default value false. |
||||
|
config |
(Optional) structure template parameter, which is of the AscendQuantConfig type. The definition is as follows:
When the values of the preceding parameters meet any of the following conditions, constant parameters are used during compilation to reduce scalar computation.
The following is an example of the default parameter configuration:
|
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. |
|
scale |
Input |
Quantization parameter. The type is Scalar, and the supported data type is float. |
|
offset |
Input |
Quantization parameter. The type is Scalar, and the supported data type is float. |
|
calCount |
Input |
Number of elements involved in the computation. |
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. |
|
scaleTensor |
Input |
Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
offsetTensor |
Input |
Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
scaleCount |
Input |
Number of parameter elements involved in the quantization. For scaleCount ∈ [0, min(scaleTensor.GetSize(),dstTensor.GetSize())], the value must be an integer multiple of 32. |
|
offsetCount |
Input |
Number of parameter elements involved in the quantization. For offsetCount ∈ [0, min(offsetTensor.GetSize(),dstTensor.GetSize())], the value must be the same as that of scaleCount and be an integer multiple of 32. |
|
calCount |
Input |
Number of elements involved in the computation. The value of calCount must be an integer multiple of the value of scaleCount. |
Returns
None
Restrictions
- The source operand and destination operand can be reused.
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- The length of the data involved in computation of the input and output operands must be 32-byte aligned.
- When scale is of the float type, its value range is still the range of values of the half type.
Atlas training products support only PER_TENSOR quantization. PER_CHANNEL quantization is not supported.
Example
For a complete operator example, see Quant operator sample.
1 2 3 4 5 6 7 8 |
// The input shape is 1024. uint32_t dataSize = 1024; // The input type is float or half, with scale being 2.0 and offset being 0.9. Temporary space is reserved. AscendC::AscendQuant<srcType>(dstLocal, srcLocal, 2.0f, 0.9f, dataSize); // Example of using a template parameter to enable constant parameters // static constexpr AscendC::AscendQuantConfig static_config = {1024, 0, 0, 0}; // Use the static_config template parameter of the AscendQuantConfig type to enable constant parameters. // AscendC::AscendQuant<srcType, false, static_config>(dstLocal, srcLocal, 2.0f, 0.9f, dataSize); |
Result example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
Input data (srcLocal): [-3.22 2.09 -2.025 -2.895 -1.349 -3.336 1.376 2.453 3.861 1.085 -2.273 0.3923 0.3645 -2.127 -3.09 -0.002726 -2.783 0.2615 -0.904 1.507 -1.017 3.568 2.219 0.8643 0.922 1.144 -1.853 2.002 -1.705 1.675 -3.482 1.519 0.4172 0.4307 -1.228 -2.62 0.3354 -3.586 2.604 1.688 -3.646 -3.389 -3.918 3.955 0.7954 -2.562 -1.085 2.91 -0.398 3.771 -2.914 1.726 3.367 3.482 3.49 1.382 3.512 0.1938 -0.4087 -3.75 2.873 -2.54 1.826 3.738 3.188 2.676 0.724 -1.108 -2.682 -0.4783 2.082 -0.462 -2.955 -2.543 3.98 -1.85 3.018 -2.688 3.596 -0.799 1.222 1.686 -0.7925 3.295 -3.568 -0.03836 -2.002 -1.212 1.927 -1.11 1.046 3.793 -0.6226 -3.494 -3.371 -2.354 -1.7 -0.948 2.682 -3.344 2.566 2.533 -1.335 1.405 3.867 3.674 1.359 3.145 -1.22 1.054 -2.492 -1.214 3.879 2.014 2.664 -2.863 -3.88 2.857 1.695 2.852 2.893 2.367 -0.1832 -3.254 -1.49 1.13 0.672 -1.863 -3.547 3.281 -1.573 -1.349 -3.547 -3.766 -2.99 -3.203 -2.703 -2.793 -1.501 0.4785 -1.216 -1.205 0.9097 -3.438 0.781 -1.505 -1.982 0.2037 0.4595 0.759 0.844 -3.396 0.4778 -0.899 -2.342 -0.961 -2.531 -0.10913 -3.516 -3.66 1.337 -3.44 0.7495 1.958 2.775 0.0968 -3. -2.13 -1.818 2.664 2.066 -1.923 2.97 -2.047 -3.598 0.1661 -0.179 3.186 -1.247 2.777 -3.344 -3.148 2.275 2.916 -1.081 -3.213 2.87 -3.12 -3.066 -0.6 -3.78 -3.012 -3.86 -0.707 -0.2203 -3.338 -2.273 2.062 -2.422 -0.443 -1.333 -2.2 -1.478 -2.816 1.134 0.2115 -2.459 3.842 -2.768 2.822 1.3125 -2.143 1.971 -3.543 -0.07794 -0.1265 0.763 -3.26 3.514 3.629 0.1902 1.277 -0.1652 -0.006435 -1.25 2.258 -2.887 3.66 2.729 -3.27 -0.5615 -3.176 -1.2295 1.556 -0.6626 -2.777 1.946 -0.338 -2.977 -0.8135 -2.37 0.7764 3.525 -0.6196 2.436 2.38 -1.708 0.814 0.4688 -1.255 1.04 -1.077 3.176 1.859 0.9194 2.703 1.436 1.762 2.2 1.794 -1.234 -2.148 -2.393 2.846 1.854 0.3428 -2.379 0.2429 -1.561 2.582 0.6836 1.811 -2.53 -3.951 -2.096 -2.639 2.02 2.799 -0.8936 -1.295 -3.914 -1.82 2.541 -2.773 1.733 3.955 -3.092 0.04095 0.82 -1.071 3.93 -3.158 -2.5 -0.5415 -1.98 -0.1626 3.092 -1.3125 3.387 -2.496 2.355 -3.033 -3.814 -3.191 2.686 1.377 1.381 -3.047 2.127 -0.4927 -1.718 2.371 -0.1648 1.885 -0.6826 -3.121 -2.379 -3.959 -2.164 2.262 -2.973 3.092 2.111 -0.03732 2.836 -2.725 3.436 1.017 2.877 -2.926 2.547 0.8574 2.643 2.646 -0.889 3.363 -0.3147 -0.09546 0.0551 -3.947 -1.434 -0.6104 -3.41 -2.176 -1.866 3.975 -3.031 -1.25 3.918 3.697 3.21 -2.436 -3.281 -3.225 0.7856 2.043 1.415 -2.252 -1.648 0.03824 -3.432 0.3271 1.458 -0.02289 -0.643 1.441 -0.1847 1.062 3.545 0.367 1.796 -1.687 2.06 0.2373 3.748 -2.752 2.73 -2.693 -3.54 -2.275 -3.033 -1.622 -3.936 1.295 2.586 -2.926 -2.314 2.527 -1.619 -0.04037 -3.225 1.771 3.064 -1.173 -2.324 3.332 -0.8257 1.075 -3.287 1.075 -2.262 1.419 -0.344 -0.4988 1.113 3.068 -1.104 2.531 2.645 0.6333 0.3677 -3.186 -0.3726 2.549 -0.3347 2.227 -3.963 -2.564 3.656 1.069 -3.684 -1.388 -0.2568 -0.726 0.4883 1.946 -1.579 -0.8438 -2.014 2.332 0.306 -3.305 -3.588 -1.038 3.299 0.832 0.8594 -1.163 1.2705 2.018 -3.352 2.537 2.111 -3.61 0.645 -2.459 -2.469 1.002 -3.914 1.079 -0.9214 -2.111 -3.88 -0.5254 -1.908 -1.19 3.559 -3.285 -2.266 3.672 0.001524 -1.964 -1.742 1.895 3.887 1.737 0.909 0.5044 2.55 0.8936 2.139 -3.658 1.828 -3.688 -3.26 1.436 -1.321 -3.19 2.764 -3.305 -2.52 -2.441 -0.32 -2.402 2.252 -1.527 0.719 0.2328 0.1766 -2.088 3.729 0.844 -1.174 -0.7427 0.8296 -0.1885 -0.0379 2.92 2.502 3.846 1.657 -3.58 -3.352 -3.904 -2.43 1.159 -1.707 2.21 2.367 -0.5864 -1.647 1.952 ] Output data (dstLocal): [-6 5 -3 -5 -2 -6 4 6 9 3 -4 2 2 -3 -5 1 -5 1 -1 4 -1 8 5 3 3 3 -3 5 -3 4 -6 4 2 2 -2 -4 2 -6 6 4 -6 -6 -7 9 2 -4 -1 7 0 8 -5 4 8 8 8 4 8 1 0 -7 7 -4 5 8 7 6 2 -1 -4 0 5 0 -5 -4 9 -3 7 -4 8 -1 3 4 -1 7 -6 1 -3 -2 5 -1 3 8 0 -6 -6 -4 -2 -1 6 -6 6 6 -2 4 9 8 4 7 -2 3 -4 -2 9 5 6 -5 -7 7 4 7 7 6 1 -6 -2 3 2 -3 -6 7 -2 -2 -6 -7 -5 -6 -5 -5 -2 2 -2 -2 3 -6 2 -2 -3 1 2 2 3 -6 2 -1 -4 -1 -4 1 -6 -6 4 -6 2 5 6 1 -5 -3 -3 6 5 -3 7 -3 -6 1 1 7 -2 6 -6 -5 5 7 -1 -6 7 -5 -5 0 -7 -5 -7 -1 0 -6 -4 5 -4 0 -2 -3 -2 -5 3 1 -4 9 -5 7 4 -3 5 -6 1 1 2 -6 8 8 1 3 1 1 -2 5 -5 8 6 -6 0 -5 -2 4 0 -5 5 0 -5 -1 -4 2 8 0 6 6 -3 3 2 -2 3 -1 7 5 3 6 4 4 5 4 -2 -3 -4 7 5 2 -4 1 -2 6 2 5 -4 -7 -3 -4 5 6 -1 -2 -7 -3 6 -5 4 9 -5 1 3 -1 9 -5 -4 0 -3 1 7 -2 8 -4 6 -5 -7 -5 6 4 4 -5 5 0 -3 6 1 5 0 -5 -4 -7 -3 5 -5 7 5 1 7 -5 8 3 7 -5 6 3 6 6 -1 8 0 1 1 -7 -2 0 -6 -3 -3 9 -5 -2 9 8 7 -4 -6 -6 2 5 4 -4 -2 1 -6 2 4 1 0 4 1 3 8 2 4 -2 5 1 8 -5 6 -4 -6 -4 -5 -2 -7 3 6 -5 -4 6 -2 1 -6 4 7 -1 -4 8 -1 3 -6 3 -4 4 0 0 3 7 -1 6 6 2 2 -5 0 6 0 5 -7 -4 8 3 -6 -2 0 -1 2 5 -2 -1 -3 6 2 -6 -6 -1 7 3 3 -1 3 5 -6 6 5 -6 2 -4 -4 3 -7 3 -1 -3 -7 0 -3 -1 8 -6 -4 8 1 -3 -3 5 9 4 3 2 6 3 5 -6 5 -6 -6 4 -2 -5 6 -6 -4 -4 0 -4 5 -2 2 1 1 -3 8 3 -1 -1 3 1 1 7 6 9 4 -6 -6 -7 -4 3 -3 5 6 0 -2 5] |