SetQuantVector

Function Description

Matmul computation supports int8 inputs and half/int8 outputs. In this scenario, the dequantization API needs to be called for dequantization. After the dequantization API is called, a dequantization operation is performed to dequantize the final result to the half type when data is moved from L0C to GM. The dequantization API in this section provides a quantization parameter vector. The shape of the vector is [1, N]. The value of N is the same as that in M/N/K during Matmul matrix computation. Each row of the output matrix is dequantized by using the dequantization coefficient in the corresponding row in the provided vector.

Call this API before calling Iterate or IterateAll.

Prototype

__aicore__ inline void SetQuantVector(const GlobalTensor<uint64_t>& quantTensor)

Parameters

Parameter	Input/Output	Description
quantTensor	Input	Parameter vector for dequantization

Returns

None

Availability

Precautions

None

Example

GlobalTensor gmQuant;
...
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.SetQuantVector(gmQuant);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);

Parent topic: Matmul