SetAntiQuantVector

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

x

Atlas A2 training products/Atlas A2 inference products

x

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

x

Function

Matmul computation supports half inputs of matrix A and int8 inputs of matrix B. In this scenario, the pseudo-quantization API must be called for pseudo-quantization. After the API is called, the pseudo-quantization operation is performed to convert matrix B to the half type when data is moved from the GM to L1. The pseudo-quantization API in this section provides a quantization parameter vector. The shape of the vector is [1, N]. The value of N is the same as that in M/N/K during Matmul matrix computation. Each row of matrix B is pseudo-quantized by using the pseudo-quantization coefficient in the corresponding row in the provided vector.

Call this API before calling Iterate or IterateAll.

Prototype

1
__aicore__ inline void SetAntiQuantVector(const LocalTensor<SrcT> &offsetTensor, const LocalTensor<SrcT> &scaleTensor)

Parameters

Parameter

Input/Output

Description

offsetTensor

Input

Parameter vector for pseudo-quantization, which is used for addition. The data type is determined by SrcT, which corresponds to the type defined in A_TYPE.

scaleTensor

Input

Parameter vector for pseudo-quantization, which is used for multiplication. The data type is determined by SrcT, which corresponds to the type defined in A_TYPE.

Returns

None

Restrictions

None