SetQuantVector
Function Description
Matmul computation supports int8 inputs and half/int8 outputs. In this scenario, the dequantization API needs to be called for dequantization. After the dequantization API is called, a dequantization operation is performed to dequantize the final result to the half type when data is moved from L0C to GM. The dequantization API in this section provides a quantization parameter vector. The shape of the vector is [1, N]. The value of N is the same as that in M/N/K during Matmul matrix computation. Each row of the output matrix is dequantized by using the dequantization coefficient in the corresponding row in the provided vector.
Call this API before calling Iterate or IterateAll.
Prototype
1 | __aicore__ inline void SetQuantVector(const GlobalTensor<uint64_t>& quantTensor) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
quantTensor |
Input |
Parameter vector for dequantization |
Returns
None
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 | GlobalTensor gmQuant; ... REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); mm.SetQuantVector(gmQuant); mm.SetTensorA(gm_a); mm.SetTensorB(gm_b); mm.SetBias(gm_bias); mm.IterateAll(gm_c); |