SetQuantScalar
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
x |
|
|
x |
Function
Quantizes or dequantizes all values in the output matrix using the same coefficient. That is, the entire C matrix corresponds to one quantization parameter, and the shape of the quantization parameter is [1]. For details about quantization and dequantization, see Quantization Scenarios.
Matmul dequantization scenario: During Matmul computation, the input of the left and right matrices is of the int8_t or int4b_t type, and the output is of the half type. Alternatively, both the input and output of the left and right matrices are of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, dequantization is performed to dequantize the final result to the half or int8_t type.
Matmul quantization scenario: During Matmul computation, the input of the left and right matrices is of the half or bfloat16_t type, and the output is of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, quantization is performed to quantize the final result to the int8_t type.
Prototype
1
|
__aicore__ inline void SetQuantScalar(const uint64_t quantScalar) |
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
quantScalar |
Input |
Quantization or dequantization coefficient. |
Returns
None
Restrictions
The value must be the same as that of SetDequantType.
This API must be called before Iterate or IterateAll.
Example
1 2 3 4 5 6 7 8 |
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); float tmp = 0.1; // Multiplied by 0.1 during GM output uint64_t ans = static_cast<uint64_t>(*reinterpret_cast<int32_t*>(&tmp)); // Quantization or dequantization coefficient of the floating-point value converted to the uint64_t type for setting mm.SetQuantScalar(ans); mm.SetTensorA(gm_a); mm.SetTensorB(gm_b); mm.SetBias(gm_bias); mm.IterateAll(gm_c); |