SetQuantScalar

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

Quantizes or dequantizes all values in the output matrix using the same coefficient. That is, the entire C matrix corresponds to one quantization parameter, and the shape of the quantization parameter is [1]. For details about quantization and dequantization, see Quantization Scenarios.

Matmul dequantization scenario: During Matmul computation, the input of the left and right matrices is of the int8_t or int4b_t type, and the output is of the half type. Alternatively, both the input and output of the left and right matrices are of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, dequantization is performed to dequantize the final result to the half or int8_t type.

Matmul quantization scenario: During Matmul computation, the input of the left and right matrices is of the half or bfloat16_t type, and the output is of the int8_t type. In this scenario, when the data of matrix C is moved from CO1 to the global memory, quantization is performed to quantize the final result to the int8_t type.

Prototype

      
           __aicore__ inline void SetQuantScalar(const uint64_t quantScalar)

Parameters

Parameter	Input/Output	Description
quantScalar	Input	Quantization or dequantization coefficient.

Returns

None

Restrictions

The value must be the same as that of SetDequantType.

This API must be called before Iterate or IterateAll.

Example

      
           REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
float tmp = 0.1;  // Multiplied by 0.1 during GM output
uint64_t ans = static_cast<uint64_t>(*reinterpret_cast<int32_t*>(&tmp)); // Quantization or dequantization coefficient of the floating-point value converted to the uint64_t type for setting
mm.SetQuantScalar(ans);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);

Parent topic: Matmul Kernel APIs