AscendDequant

Function Usage

Performs dequantization by element. For example, dequantize the int32_t data type to the half/float data type. This API is limited to accepting inputs with a maximum of two dimensions and does not support inputs with higher dimensions.

Suppose that the shape of the input srcTensor is (m, n), the number of bytes occupied by each row of data (n pieces of input data) must be 32-byte aligned, and the number of elements to be dequantized in each row is calCount.
The dequantization coefficient deqScale can be a scalar or vector. If deqScale is a vector, calCount is less than or equal to the number of elements in deqScale, and only the first CalCount dequantization coefficients take effect.
The shape of the output dstTensor is (m, n_dst). If n * sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded up to 32 bytes. n_dst indicates the number of columns after the padding.

The following provides two specific examples to explain the parameter configurations and compute logics: (The DequantParams type in the following is a {m, n, calCount} structure that stores shape information.)

As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4, the first four elements in deqScale take effect, and the last 12 elements are not involved in dequantization. The data type of dstTensor is bfloat16_t, m equals 4, and n_dst equals 16 (16 × sizeof(bfloat16_t) % 32 = 0). The compute logic is that every n elements in srcTensor are arranged in a row. For the first calCount elements in each row, the ith element of srcTensor is multiplied by the ith element of deqScale and this product is written into the ith element in the corresponding row of dstTensor. Elements from calCount + 1 to n_dst in the corresponding row of dstTensor are uncertain values.
As shown in the following figure, the data type of srcTensor is int32_t, m equals 4, n equals 8, and calCount equals 4, indicating that the number of elements to be dequantized in each row of srcTensor is 4. The data type of dstTensor is float, m equals 4, and n_dst equals 8 (8 × sizeof(float) % 32 = 0). The first four elements in each row of srcTensor are multiplied by the scalar deqScale and the products are written to the corresponding positions in each row of dstTensor.

Set the template parameter mode to DEQUANT_WITH_SINGLE_ROW:

If DequantParams {m, n, calCount} meets the following three requirements:

m = 1
calCount is a multiple of 32/sizeof(dstT).
n % calCount = 0

In this case, {1, n, calCount} is considered as {n/calCount, calCount, calCount} for dequantization.

The following figure shows the effect, in which the passed DequantParams is {1, 16, 8}. Since dstT is of type float, calCount must be a multiple of 8. In DEQUANT_WITH_SINGLE_ROW mode, {1, 2 × 8, 8} is converted to {2, 8, 8} for calculation.

Principles

The figure below illustrates the internal algorithm block diagram of AscendDequant high-level APIs, taking the input srcTensor with a data type of int32_t and a shape of [m, n], input deqScale with a data type of scaleT and a shape of [n], and output dstTensor with a data type of dstT and a shape of [m, n] as examples.

Figure 1 AscendDequant internal algorithm block diagram

The computation process is divided into the following steps, all of which are performed on vectors:

Precision conversion: Convert srcTensor and deqScale into FP32 tensors to obtain srcFP32 and deqScaleFP32, respectively.
Mul computation: srcFP32 has a total of m rows, each with a length of n. Each row of srcFP32 is multiplied by deqscaleFP32 through m iterations. The mask limits the Mul computation to only the first dequantParams.calcount elements. The value range of index in the figure is [0, m), corresponding to each row of srcFP32. The computation result is mulRes, whose shape is [m, n].
Precision conversion of result data: Convert mulRes from FP32 to a dstT tensor. The result is dstTensor with a shape of [m, n].

Prototype

The deqScale parameter is a vector.

Pass the temporary space through the sharedTmpBuffer input parameter.

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)

Allocate the temporary space through the API framework.

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, DequantParams params)

The deqScale parameter is a scalar.

Pass the temporary space through the sharedTmpBuffer input parameter.

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, DequantParams params)

Allocate the temporary space through the API framework.

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const scaleT deqScale, DequantParams params)

Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendDequantMaxMinTmpSize API provided in GetAscendDequantMaxMinTmpSize.

The following APIs are not recommended. Do not use them for newly developed content:

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer, const uint32_t calCount)

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const LocalTensor<uint8_t>& sharedTmpBuffer)

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale, const uint32_t calCount)

template <typename dstT, typename scaleT, DeQuantMode mode = DeQuantMode::DEQUANT_WITH_SINGLE_ROW>
__aicore__ inline void AscendDequant(const LocalTensor<dstT>& dstTensor, const LocalTensor<int32_t>& srcTensor, const LocalTensor<scaleT>& deqScale)

Parameters

**Table 1** Parameters in the template
Parameter	Description
dstT	Data type of the destination operand.
scaleT	Data type of deqScale.
mode	Compute logic used when DequantParams is {1, n, calCount}, with enum DeQuantMode passed. The following configurations are supported: DEQUANT_WITH_SINGLE_ROW: When DequantParams {m, n, calCount} meets the following conditions: m = 1, calCount being a multiple of 32/sizeof(dstT), and n % calCount = 0, which means that {1, n, calCount} is computed as {n/calCount, calCount, calCount}. DEQUANT_WITH_MULTI_ROW: Even if all the preceding conditions are met, {1, n, calCount} is still calculated as {1, n, calCount}. This means that the first calCount elements out of the total n elements are dequantized.

Table 2 API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The number of rows of dstTensor must be the same as that of srcTensor.
If n * sizeof(dstT) does not meet the 32-byte alignment requirement, it needs to be padded up to 32 bytes. n_dst indicates the number of columns after the padding. For example, if srcTensor has a data type of int32_t and a shape of (4, 8), and dstTensor has a data type of bfloat16_t, n_dst must be padded from 8 to 16, and the dstTensor shape should be (4, 16). The padding process is as follows: n_dst = (8 x sizeof(bfloat16_t) + 32 – 1)/32 x 32/sizeof(bfloat16_t).

srcTensor

Input

Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The shape is [m, n], and the number of bytes occupied by n pieces of input data must be 32-byte aligned.

deqScale

Input

Source operand. The type is scalar or LocalTensor. When the type is LocalTensor, the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about the data type combinations supported by dstTensor, srcTensor, and deqScale, see Table 3 and Table 4.

sharedTmpBuffer

Input

Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GetAscendDequantMaxMinTmpSize.

params

Input

Shape of srcTensor, DequantParams type. The definition is as follows:

struct DequantParams
{
    uint32_t m;             // number of rows of srcTensor
    uint32_t n;             // number of columns of srcTensor
    uint32_t calCount;      // For each row of srcTensor, the first calCount elements are valid data, and they are multiplied by the first calCount elements of deqScale or by the deqScale scalar.
};

DequantParams.n * sizeof(T) must be an integer multiple of 32 bytes. T is the data type of the elements in srcTensor.
Since the multiplication operation is performed on the first calCount elements among every n elements, DequantParams.n and calCount must meet the following relationship:
1 ≤ DequantParams.calCount ≤ DequantParams.n
When deqScale is a vector, DequantParams.calCount is less than or equal to the number of elements in deqScale.

Parameter

Input/Output

Description

dstTensor

Output

Destination operand.