AscendAntiQuant

Function Description

Performs fake quantization by element. For example, apply fake quantization to convert the int8_t data type to the half data type. The calculation formulas are as follows:

Per_channel scenario
- Input transpose disabled
  groupSize = src.shape[0] / offset.shape[0]
  
  dst[i][j] = scale[i / groupSize][j] * (src[i][j] + offset[i / groupSize][j])
- Input transpose enabled
  groupSize = src.shape[1] / offset.shape[1]
  
  dst[i][j] = scale[i][j / groupSize] * (src[i][j] + offset[i][j / groupSize])

Per_tensor scenario
dst[i][j] = scale * (src[i][j] + offset)

Principles

Figure 1 AscendAntiQuant algorithm block diagram

The preceding figure shows the algorithm block diagram of AscendAntiQuant in typical scenarios. The computation process is divided into the following steps, all of which are performed on the Vector Core:

Precision conversion: Convert the input src to the half type.
Offset calculation: Perform Add calculation when offset is a vector, and perform Adds calculation when offset is a scalar.
Scale calculation: Perform Mul calculation when scale is a vector, and perform Muls calculation when scale is a scalar.

Prototype

Pass the temporary space through the sharedTmpBuffer input parameter.

Per_channel scenario

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const LocalTensor<OutputDataType> &offset, const LocalTensor<OutputDataType> &scale, const LocalTensor<uint8_t> &sharedTmpBuffer, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Per_channel scenario (without offset)

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const LocalTensor<OutputDataType> &scale, const LocalTensor<uint8_t> &sharedTmpBuffer, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Per_tensor scenario

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const OutputDataType offset, const OutputDataType scale, const LocalTensor<uint8_t> &sharedTmpBuffer, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Per_tensor scenario (without offset)

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const OutputDataType scale, const LocalTensor<uint8_t> &sharedTmpBuffer, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Allocate the temporary space through the API framework.

Per_channel scenario

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const LocalTensor<OutputDataType> &offset, const LocalTensor<OutputDataType> &scale, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Per_tensor scenario

template <typename InputDataType, typename OutputDataType, bool isTranspose>
__aicore__ inline void AscendAntiQuant(const LocalTensor<OutputDataType> &dst, const LocalTensor<InputDataType> &src, const OutputDataType offset, const OutputDataType scale, const uint32_t K, const AntiQuantShapeInfo& shapeInfo = {})

Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the API provided in GetAscendAntiQuantMaxMinTmpSize.

Parameters

**Table 1** Parameters in the template
Parameter	Description
InputDataType	Input data type.
OutputDataType	Output data type.
isTranspose	Whether to enable input data transpose.

Table 2 API parameters

Parameter

Input/Output

Description

dst

Output

Destination operand.