AscendDequant

Function Usage

Performs dequantization by element. For example, dequantize the int32_t data type to the half/float data type. This API is limited to accepting inputs with a maximum of two dimensions and does not support inputs with higher dimensions.

  • Suppose that the shape of the input srcTensor is (m, n), the number of bytes occupied by each row of data (n pieces of input data) must be 32-byte aligned, and the number of elements to be dequantized in each row is calCount.
  • The dequantization coefficient deqScale can be a scalar or vector. If deqScale is a vector, calCount is less than or equal to the number of elements in deqScale, and only the first CalCount dequantization coefficients take effect.
  • The shape of the output dstTensor is (m, n_dst). If n * sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded up to 32 bytes. n_dst indicates the number of columns after the padding.

The following provides two specific examples to explain the parameter configurations and compute logics: (The DequantParams type in the following is a {m, n, calCount} structure that stores shape information.)

  • As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4, the first four elements in deqScale take effect, and the last 12 elements are not involved in dequantization. The data type of dstTensor is bfloat16_t, m equals 4, and n_dst equals 16 (16 × sizeof(bfloat16_t) % 32 = 0). The compute logic is that every n elements in srcTensor are arranged in a row. For the first calCount elements in each row, the ith element of srcTensor is multiplied by the ith element of deqScale and this product is written into the ith element in the corresponding row of dstTensor. Elements from calCount + 1 to n_dst in the corresponding row of dstTensor are uncertain values.

  • As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4. The data type of dstTensor is float, m equals 4, and n_dst equals 8 (8 × sizeof(float) % 32 = 0). The first four elements in each row of srcTensor are multiplied by the scalar deqScale and the products are written to the corresponding positions in each row of dstTensor.

Set the template parameter mode to DEQUANT_WITH_SINGLE_ROW:

If DequantParams {m, n, calCount} meets the following three requirements:

  1. m = 1
  2. calCount is a multiple of 32/sizeof(dstT).
  3. n % calCount = 0

In this case, {1, n, calCount} is considered as {n/calCount, calCount, calCount} for dequantization.

The following figure shows the effect, in which the passed DequantParams is {1, 16, 8}. Since dstT is of type float, calCount must be a multiple of 8. In DEQUANT_WITH_SINGLE_ROW mode, {1, 2 × 8, 8} is converted to {2, 8, 8} for calculation.

Principles

The figure below illustrates the internal algorithm block diagram of AscendDequant high-level APIs, taking the input srcTensor with a data type of int32_t and a shape of [m, n], input deqScale with a data type of scaleT and a shape of [n], and output dstTensor with a data type of dstT and a shape of [m, n] as examples.

Figure 1 AscendDequant internal algorithm block diagram

The computation process is divided into the following steps, all of which are performed on vectors:

  1. Precision conversion: Convert srcTensor and deqScale into FP32 tensors to obtain srcFP32 and deqScaleFP32, respectively.
  2. Mul computation: srcFP32 has a total of m rows, each with a length of n. Each row of srcFP32 is multiplied by deqscaleFP32 through m iterations. The mask limits the Mul computation to only the first dequantParams.calcount elements. The value range of index in the figure is [0, m), corresponding to each row of srcFP32. The computation result is mulRes, whose shape is [m, n].
  3. Precision conversion of result data: Convert mulRes from FP32 to a dstT tensor. The result is dstTensor with a shape of [m, n].

Prototype

  • The deqScale parameter is a vector.
    • Pass the temporary space through the sharedTmpBuffer input parameter.
      1
      2
      template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
      __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)
      
    • Allocate the temporary space through the API framework.
      1
      2
      template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
      __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, DequantParams params)
      
  • The deqScale parameter is a scalar.
    • Pass the temporary space through the sharedTmpBuffer input parameter.
      1
      2
      template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
      __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)
      
    • Allocate the temporary space through the API framework.
      1
      2
      template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
      __aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, DequantParams params)
      

Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

  • When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.
  • When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendDequantMaxMinTmpSize API provided in GetAscendDequantMaxMinTmpSize.

The following APIs are not recommended. Do not use them for newly developed content:

1
2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, const uint32_t calCount)
1
2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer)
1
2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const uint32_t calCount)
1
2
template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale)

Parameters

Table 1 Parameters in the template

Parameter

Description

dstT

Data type of the destination operand.

scaleT

Data type of deqScale.

mode

Compute logic used when DequantParams is {1, n, calCount}, with enum DeQuantMode passed. The following configurations are supported:
  • DEQUANT_WITH_SINGLE_ROW: When DequantParams {m, n, calCount} meets the following conditions: m = 1, calCount being a multiple of 32/sizeof(dstT), and n % calCount = 0, which means that {1, n, calCount} is computed as {n/calCount, calCount, calCount}.
  • DEQUANT_WITH_MULTI_ROW: Even if all the preceding conditions are met, {1, n, calCount} is still calculated as {1, n, calCount}. This means that the first calCount elements out of the total n elements are dequantized.
Table 2 API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

  • The number of rows of dstTensor must be the same as that of srcTensor.
  • If n * sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded up to 32 bytes. n_dst indicates the number of columns after the padding. For example, if srcTensor has a data type of int32_t and a shape of (4, 8), and dstTensor has a data type of bfloat16_t, n_dst must be padded from 8 to 16, and the dstTensor shape should be (4, 16). The padding process is as follows: n_dst = (8 x sizeof(bfloat16_t) + 32 – 1)/32 x 32/sizeof(bfloat16_t).

srcTensor

Input

Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The shape is [m, n], and the number of bytes occupied by n pieces of input data must be 32-byte aligned.

deqScale

Input

Source operand. The type is scalar or LocalTensor. When the type is LocalTensor, the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about the data type combinations supported by dstTensor, srcTensor, and deqScale, see Table 3 and Table 4.

sharedTmpBuffer

Input

Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendDequantMaxMinTmpSize.

params

Input

Shape of srcTensor, DequantParams type. The definition is as follows:

1
2
3
4
5
6
struct DequantParams
{
    uint32_t m;             // number of rows of srcTensor
    uint32_t n;             // number of columns of srcTensor
    uint32_t calCount;      // For each row of srcTensor, the first calCount elements are valid data, and they are multiplied by the first calCount elements of deqScale or by the deqScale scalar.
};
  • DequantParams.n * sizeof(T) must be an integer multiple of 32 bytes. T is the data type of the elements in srcTensor.
  • Since the multiplication operation is performed on the first calCount elements among every n elements, DequantParams.n and calCount must meet the following relationship:

    1 ≤ DequantParams.calCountDequantParams.n

  • When deqScale is a vector, DequantParams.calCount is less than or equal to the number of elements in deqScale.

Parameter

Input/Output

Description

dstTensor

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

srcTensor

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

sharedTmpBuffer

Input

Temporary buffer.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize.

scaleTensor

Input

Quantization parameter scale.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

offsetTensor

Input

Quantization parameter offset. Reserved.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

para

Input

Dequantization API parameter. The definition is as follows:

1
2
3
4
5
6
struct AscendDeQuantParam {
        uint32_t m;
        uint32_t n;
        uint32_t calCount;   
        uint32_t groupSize = 0;
}
  • m: number of elements in the m direction.
  • n: number of elements in the n direction. The data size corresponding to the value of n must be 32-byte aligned.

    Note that the original shape of deqscale is [m, (n + groupSize – 1)/groupSize] in the per_group scenario with kdim = 1. Before calling this API, you need to make the data size of the last dimension of deqscale 32-byte aligned by padding.

  • calCount: number of computed data elements, which is within the range of [0, srcTensor.GetSize()] and must be an integer multiple of n.
  • groupSize: data in the groupSize row or column which shares a scale or offset (valid in the per_group scenario). The value of groupSize must be an integer multiple of 32.
Table 3 Supported data type combinations (deqScale is LocalTensor)

dstTensor

srcTensor

deqScale

half

int32_t

uint64_t

Note: When the data type of deqScale is uint64_t, the lower 32 bits of the value are used for computation. When the data type is float, the upper 32 bits of the value are some control parameters which are not used by this API.

float

int32_t

float

float

int32_t

bfloat16_t

bfloat16_t

int32_t

bfloat16_t

bfloat16_t

int32_t

float

Table 4 Supported data type combinations (deqScale is a scalar)

dstTensor

srcTensor

deqScale

bfloat16_t

int32_t

bfloat16_t

bfloat16_t

int32_t

float

float

int32_t

bfloat16_t

float

int32_t

float

Returns

None

Availability

Constraints

  • The source operand address must not overlap the destination operand address.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example

For a complete operator example, see dequant operator sample.
1
2
3
4
rowLen = m;                 // m = 4
colLen = n;                 // n = 8
//The shape of the input srcLocal is 4 x 8 and the type is int32_t. The shape of deqScaleLocal is 8 and the type is float. Temporary space is reserved.
AscendC::AscendDequant(dstLocal, srcLocal, deqScaleLocal, {rowLen, colLen, deqScaleLocal.GetSize()});

Result example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Input data (srcLocal) of the int32_t data type:
[ -8  5 -5 -7 -3 -8  3  6
   9  2 -5  0  0 -5 -7  0 
  -6  0 -2  3 -2 8   5  2 
   2  2 -4  5 -4  4 -8  3 ]

deqScale of the float data type:
[ 10.433567  10.765296   -30.694275   -65.47741    8.386527    -89.646194   65.11153    42.213394]

Output data (dstLocal) of the float data type:
[-83.46854      53.82648    153.47137    458.34186    -25.15958   717.16956    195.33458   253.28036 
 93.9021        21.530592   153.47137    -0.          0.          448.23096    -455.7807   0.    
 -62.601402     0.          61.38855     -196.43222   -16.773054  -717.16956   325.55762   84.42679 
 20.867134      21.530592   122.7771     -327.38705   -33.54611   -358.58478   -520.8922   126.64018 ]