SetQuantScalar

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product 's AI Core

Atlas inference product 's Vector Core

x

Atlas training products

x

Function

Quantizes or dequantizes all values in the output matrix using the same coefficient. That is, the entire C matrix corresponds to one quantization parameter, and the shape of the quantization parameter is [1]. For details about quantization and dequantization, see Quantization Scenarios.

Matmul dequantization scenario: During Matmul computation, the input of the left and right matrices is of the int8_t or int4b_t type, and the output is of the half type. Alternatively, both the input and output of the left and right matrices are of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, dequantization is performed to dequantize the final result to the half or int8_t type.

Matmul quantization scenario: During Matmul computation, the input of the left and right matrices is of the half or bfloat16_t type, and the output is of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, quantization is performed to quantize the final result to the int8_t type.

Prototype

1
__aicore__ inline void SetQuantScalar(const uint64_t quantScalar)

Parameters

Parameter

Input/Output

Description

quantScalar

Input

Quantization or dequantization coefficient.

Returns

None

Restrictions

The value must be the same as that of SetDequantType.

This API must be called before Iterate or IterateAll.

Example

1
2
3
4
5
6
7
8
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
float tmp = 0.1;  // Multiplied by 0.1 during GM output
uint64_t ans = static_cast<uint64_t>(*reinterpret_cast<int32_t*>(&tmp)); // Quantization or dequantization coefficient of the floating-point value converted to the uint64_t type for setting
mm.SetQuantScalar(ans);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);