AscendDequant

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

Performs dequantization by element. For example, dequantize the int32_t data type to the half or float data type. This API supports input of data no more than two dimensions.

Assuming that the shape of the input srcTensor is (m, n), the number of bytes occupied by each row of data (n pieces of input data) must be 32-byte aligned, and the number of elements to be dequantized in each row is calCount.
The dequantization coefficient deqScale can be a scalar or vector. If deqScale is a vector, calCount is less than or equal to the number of elements in deqScale, and only the first CalCount dequantization coefficients take effect.
The shape of the output dstTensor is (m, n_dst). If n x sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded to 32 bytes. n_dst indicates the number of columns after the padding.

The following provides two specific examples to explain the parameter configurations and compute logics: (The DequantParams type in the following is a {m, n, calCount} structure that stores shape information.)

As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4, the first four elements in deqScale take effect, and the last 12 elements are not involved in dequantization. The data type of dstTensor is bfloat16_t, m equals 4, and n_dst equals 16 (16 × sizeof(bfloat16_t) % 32 = 0). The compute logic is that every n elements in srcTensor are arranged in a row. For the first calCount elements in each row, the ith element of srcTensor is multiplied by the ith element of deqScale and this product is written into the ith element in the corresponding row of dstTensor. Elements from calCount + 1 to n_dst in the corresponding row of dstTensor are uncertain values.
As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4. The data type of dstTensor is float, m equals 4, and n_dst equals 8 (8 × sizeof(float) % 32 = 0). The first four elements in each row of srcTensor are multiplied by the scalar deqScale and the products are written to the corresponding positions in each row of dstTensor.

Set the template parameter mode to DEQUANT_WITH_SINGLE_ROW:

If DequantParams {m, n, calCount} meets the following three conditions:

m = 1
calCount is a multiple of 32/sizeof (dstT).
n % calCount = 0

In this case, {1, n, calCount} is considered as {n/calCount, calCount, calCount} for dequantization.

The following figure shows the effect, in which the passed DequantParams is {1, 16, 8}. Since dstT is float, calCount must be a multiple of 8. In DEQUANT_WITH_SINGLE_ROW mode, {1, 2 × 8, 8} is converted to {2, 8, 8} for calculation.

Principles

The figure below illustrates the internal algorithm block diagram of AscendDequant high-level APIs, taking the input srcTensor with a data type of int32_t and a shape of [m, n], input deqScale with a data type of scaleT and a shape of [n], and output dstTensor with a data type of dstT and a shape of [m, n] as examples.

Figure 1 AscendDequant internal algorithm block diagram

The computation process is divided into the following steps, all of which are performed on vectors:

Precision conversion: Convert srcTensor and deqScale into FP32 tensors to obtain srcFP32 and deqScaleFP32, respectively.
Mul computation: srcFP32 has a total of m rows, each with a length of n. Each row of srcFP32 is multiplied by deqScaleFP32 through m cycles. The mask limits the Mul computation to only the first dequantParams.calcount elements. The value range of index in the figure is [0, m), corresponding to each row of srcFP32. The calculation result is mulRes, whose shape is [m, n].
Precision conversion of result data: Convert mulRes from FP32 to a dstT tensor. The result is dstTensor with a shape of [m, n].

Prototype

The deqScale parameter is a vector.

Pass to the temporary space through the sharedTmpBuffer input parameter.

          
               template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)

Allocate the temporary space through the API framework.

          
               template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, DequantParams params)

The deqScale parameter is a scalar.

Pass to the temporary space through the sharedTmpBuffer input parameter.

          
               template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)

Allocate the temporary space through the API framework.

          
               template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, DequantParams params)

Due to the complex mathematical computation involved in the internal implementation of this API, extra temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, you do not need to allocate the space, but must reserve the required size for the temporary space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendDequantMaxMinTmpSize API provided in GetAscendDequantMaxMinTmpSize.

The following APIs are not recommended. Do not use them for newly developed content:

      
           template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, const uint32_t calCount)

      
           template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer)

      
           template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const uint32_t calCount)

      
           template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale)

Parameters

**Table 1** Template parameters
Parameter	Description
dstT	Data type of the destination operand.
scaleT	Data type of deqScale.
mode	Compute logic used when DequantParams is {1, n, calCount}, with enum DeQuantMode passed. The following configurations are supported: DEQUANT_WITH_SINGLE_ROW: When DequantParams {m, n, calCount} meets the following conditions: m = 1, calCount being a multiple of 32/sizeof(dstT), and n % calCount = 0, {1, n, calCount} is computed as {n/calCount, calCount, calCount}. DEQUANT_WITH_MULTI_ROW: Even if all the preceding conditions are met, {1, n, calCount} is still calculated as {1, n, calCount}. This means that the first calCount elements out of the total n elements are dequantized.

Table 2 API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For the Atlas A3 training products / Atlas A3 inference products , the supported data types are half, bfloat16_t, and float.

For the Atlas A2 training products / Atlas A2 inference products , the supported data types are half, bfloat16_t, and float.

For the Atlas inference product 's AI Core, the supported data types are half and float.

The number of rows of dstTensor must be the same as that of srcTensor.
If n x sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded to 32 bytes. n_dst indicates the number of columns after the padding. For example, if srcTensor has a data type of int32_t and a shape of (4, 8), and dstTensor has a data type of bfloat16_t, n_dst must be padded from 8 to 16, and the dstTensor shape should be (4, 16). The padding process is as follows: n_dst = (8 x sizeof(bfloat16_t) + 32 –1)/32 x 32/sizeof(bfloat16_t)

srcTensor

Input

Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For the Atlas A3 training products / Atlas A3 inference products , the supported data type is int32_t.

For the Atlas A2 training products / Atlas A2 inference products , the supported data type is int32_t.

For the Atlas inference product 's AI Core, the supported data type is int32_t.

The shape is [m, n], and the number of bytes occupied by n pieces of input data must be 32-byte aligned.

deqScale

Input

Source operand. The type is scalar or LocalTensor. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For the Atlas A3 training products / Atlas A3 inference products , when deqScale is a vector, the supported data types are uint64_t, float, and bfloat16_t. When deqScale is a scalar, the supported data types are bfloat16_t and float.

For the Atlas A2 training products / Atlas A2 inference products , when deqScale is a vector, the supported data types are uint64_t, float, and bfloat16_t. When deqScale is a scalar, the supported data types are bfloat16_t and float.

For the Atlas inference product 's AI Core, when deqScale is a vector, the supported data types are uint64_t and float. When deqScale is a scalar, the supported data type is float.

For details about the data type combinations supported by dstTensor, srcTensor, and deqScale, see Table 3 and Table 4.

sharedTmpBuffer

Input

Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendDequantMaxMinTmpSize.

For the Atlas A3 training products / Atlas A3 inference products , the supported data type is uint8_t.

For the Atlas A2 training products / Atlas A2 inference products , the supported data type is uint8_t.

For the Atlas inference product 's AI Core, the supported data type is uint8_t.

params

Input

Shape of srcTensor, which is of the DequantParams type. The definition is as follows:

           
                struct DequantParams
{
    uint32_t m;             // number of rows of srcTensor
    uint32_t n;             // number of columns of srcTensor
    uint32_t calCount;      // For each row of srcTensor, the first calCount elements are valid data, and they are multiplied by the first calCount elements of deqScale or by the deqScale scalar.
};

DequantParams.n x sizeof(T) must be an integer multiple of 32 bytes. T is the data type of the elements in srcTensor.
Since the multiplication operation is performed on the first calCount elements among every n elements, DequantParams.n and calCount must meet the following relationship:
1 ≤ DequantParams.calCount ≤ DequantParams.n
When deqScale is a vector, DequantParams.calCount is less than or equal to the number of elements in deqScale.

**Table 3** Supported data type combinations (**deqScale** is **LocalTensor**)
dstTensor	srcTensor	deqScale
half	int32_t	uint64_t Note: When the data type of deqScale is uint64_t, the lower 32 bits of the value are used for computation. When the data type is float, the upper 32 bits of the value are some control parameters which are not used by this API.
float	int32_t	float
float	int32_t	bfloat16_t
bfloat16_t	int32_t	bfloat16_t
bfloat16_t	int32_t	float

**Table 4** Supported data type combinations (**deqScale** is a scalar)
dstTensor	srcTensor	deqScale
bfloat16_t	int32_t	bfloat16_t
bfloat16_t	int32_t	float
float	int32_t	bfloat16_t
float	int32_t	float

Returns

None

Restrictions

The source operand address must not overlap the destination operand address.
For details about the operand address alignment requirements, see General Address Alignment Restrictions.

Example

For a complete operator example, see Dequant operator sample.

       
            rowLen = m;                 // m = 4
colLen = n;                 // n = 8
//The shape of the input srcLocal is 4 x 8 and the type is int32_t. The shape of deqScaleLocal is 8 and the type is float. Temporary space is reserved.
AscendC::AscendDequant(dstLocal, srcLocal, deqScaleLocal, {rowLen, colLen, deqScaleLocal.GetSize()});

Result example:

      
           Input data (srcLocal) of the int32_t data type:
[ -8  5 -5 -7 -3 -8  3  6
   9  2 -5  0  0 -5 -7  0 
  -6  0 -2  3 -2 8   5  2 
   2  2 -4  5 -4  4 -8  3 ]

deqScale of the float data type:
[ 10.433567  10.765296   -30.694275   -65.47741    8.386527    -89.646194   65.11153    42.213394]

Output data (dstLocal) of the float data type:
[-83.46854      53.82648    153.47137    458.34186    -25.15958   717.16956    195.33458   253.28036 
 93.9021        21.530592   153.47137    -0.          0.          448.23096    -455.7807   0.    
 -62.601402     0.          61.38855     -196.43222   -16.773054  -717.16956   325.55762   84.42679 
 20.867134      21.530592   122.7771     -327.38705   -33.54611   -358.58478   -520.8922   126.64018 ]

Parent topic: Quantization