AscendDequant
Function Usage
Performs dequantization by element. For example, dequantize the int32_t data type to the half/float data type. This API is limited to accepting inputs with a maximum of two dimensions and does not support inputs with higher dimensions.
- Suppose that the shape of the input srcTensor is (m, n), the number of bytes occupied by each row of data (n pieces of input data) must be 32-byte aligned, and the number of elements to be dequantized in each row is calCount.
- The dequantization coefficient deqScale can be a scalar or vector. If deqScale is a vector, calCount is less than or equal to the number of elements in deqScale, and only the first CalCount dequantization coefficients take effect.
- The shape of the output dstTensor is (m, n_dst). If n * sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded up to 32 bytes. n_dst indicates the number of columns after the padding.
The following provides two specific examples to explain the parameter configurations and compute logics: (The DequantParams type in the following is a {m, n, calCount} structure that stores shape information.)
- As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4, the first four elements in deqScale take effect, and the last 12 elements are not involved in dequantization. The data type of dstTensor is bfloat16_t, m equals 4, and n_dst equals 16 (16 × sizeof(bfloat16_t) % 32 = 0). The compute logic is that every n elements in srcTensor are arranged in a row. For the first calCount elements in each row, the ith element of srcTensor is multiplied by the ith element of deqScale and this product is written into the ith element in the corresponding row of dstTensor. Elements from calCount + 1 to n_dst in the corresponding row of dstTensor are uncertain values.

- As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4. The data type of dstTensor is float, m equals 4, and n_dst equals 8 (8 × sizeof(float) % 32 = 0). The first four elements in each row of srcTensor are multiplied by the scalar deqScale and the products are written to the corresponding positions in each row of dstTensor.

Set the template parameter mode to DEQUANT_WITH_SINGLE_ROW:
If DequantParams {m, n, calCount} meets the following three requirements:
- m = 1
- calCount is a multiple of 32/sizeof(dstT).
- n % calCount = 0
In this case, {1, n, calCount} is considered as {n/calCount, calCount, calCount} for dequantization.
The following figure shows the effect, in which the passed DequantParams is {1, 16, 8}. Since dstT is of type float, calCount must be a multiple of 8. In DEQUANT_WITH_SINGLE_ROW mode, {1, 2 × 8, 8} is converted to {2, 8, 8} for calculation.


Principles
The figure below illustrates the internal algorithm block diagram of AscendDequant high-level APIs, taking the input srcTensor with a data type of int32_t and a shape of [m, n], input deqScale with a data type of scaleT and a shape of [n], and output dstTensor with a data type of dstT and a shape of [m, n] as examples.

The computation process is divided into the following steps, all of which are performed on vectors:
- Precision conversion: Convert srcTensor and deqScale into FP32 tensors to obtain srcFP32 and deqScaleFP32, respectively.
- Mul computation: srcFP32 has a total of m rows, each with a length of n. Each row of srcFP32 is multiplied by deqscaleFP32 through m iterations. The mask limits the Mul computation to only the first dequantParams.calcount elements. The value range of index in the figure is [0, m), corresponding to each row of srcFP32. The computation result is mulRes, whose shape is [m, n].
- Precision conversion of result data: Convert mulRes from FP32 to a dstT tensor. The result is dstTensor with a shape of [m, n].
Prototype
- The deqScale parameter is a vector.
- Pass the temporary space through the sharedTmpBuffer input parameter.
1 2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)
- Allocate the temporary space through the API framework.
1 2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, DequantParams params)
- Pass the temporary space through the sharedTmpBuffer input parameter.
- The deqScale parameter is a scalar.
- Pass the temporary space through the sharedTmpBuffer input parameter.
1 2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)
- Allocate the temporary space through the API framework.
1 2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, DequantParams params)
- Pass the temporary space through the sharedTmpBuffer input parameter.
Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.
- When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.
- When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.
If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendDequantMaxMinTmpSize API provided in GetAscendDequantMaxMinTmpSize.
The following APIs are not recommended. Do not use them for newly developed content:
1 2 | template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, const uint32_t calCount) |
1 2 | template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer) |
1 2 | template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const uint32_t calCount) |
1 2 | template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW> __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale) |
Parameters
Parameter |
Description |
|---|---|
dstT |
Data type of the destination operand. |
scaleT |
Data type of deqScale. |
mode |
Compute logic used when DequantParams is {1, n, calCount}, with enum DeQuantMode passed. The following configurations are supported:
|
Parameter |
Input/Output |
Description |
||
|---|---|---|---|---|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
|
||
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The shape is [m, n], and the number of bytes occupied by n pieces of input data must be 32-byte aligned. |
||
deqScale |
Input |
Source operand. The type is scalar or LocalTensor. When the type is LocalTensor, the supported TPosition is VECIN, VECCALC, or VECOUT. For details about the data type combinations supported by dstTensor, srcTensor, and deqScale, see Table 3 and Table 4. |
||
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendDequantMaxMinTmpSize. |
||
params |
Input |
Shape of srcTensor, DequantParams type. The definition is as follows:
|
Parameter |
Input/Output |
Description |
||
|---|---|---|---|---|
dstTensor |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
||
srcTensor |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
||
sharedTmpBuffer |
Input |
Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. |
||
scaleTensor |
Input |
Quantization parameter scale. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
||
offsetTensor |
Input |
Quantization parameter offset. Reserved. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
||
para |
Input |
Dequantization API parameter. The definition is as follows:
|
dstTensor |
srcTensor |
deqScale |
|---|---|---|
half |
int32_t |
uint64_t Note: When the data type of deqScale is uint64_t, the lower 32 bits of the value are used for computation. When the data type is float, the upper 32 bits of the value are some control parameters which are not used by this API. |
float |
int32_t |
float |
float |
int32_t |
bfloat16_t |
bfloat16_t |
int32_t |
bfloat16_t |
bfloat16_t |
int32_t |
float |
Returns
None
Availability
Constraints
- The source operand address must not overlap the destination operand address.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Example
1 2 3 4 | rowLen = m; // m = 4 colLen = n; // n = 8 //The shape of the input srcLocal is 4 x 8 and the type is int32_t. The shape of deqScaleLocal is 8 and the type is float. Temporary space is reserved. AscendC::AscendDequant(dstLocal, srcLocal, deqScaleLocal, {rowLen, colLen, deqScaleLocal.GetSize()}); |
Result example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Input data (srcLocal) of the int32_t data type: [ -8 5 -5 -7 -3 -8 3 6 9 2 -5 0 0 -5 -7 0 -6 0 -2 3 -2 8 5 2 2 2 -4 5 -4 4 -8 3 ] deqScale of the float data type: [ 10.433567 10.765296 -30.694275 -65.47741 8.386527 -89.646194 65.11153 42.213394] Output data (dstLocal) of the float data type: [-83.46854 53.82648 153.47137 458.34186 -25.15958 717.16956 195.33458 253.28036 93.9021 21.530592 153.47137 -0. 0. 448.23096 -455.7807 0. -62.601402 0. 61.38855 -196.43222 -16.773054 -717.16956 325.55762 84.42679 20.867134 21.530592 122.7771 -327.38705 -33.54611 -358.58478 -520.8922 126.64018 ] |