SetDequantType

Function

Sets the quantization or dequantization mode.

Matmul dequantization scenario: During Matmul computation, the input of the left and right matrices is of the int8_t or int4b_t type, and the output is of the half type. Alternatively, both the input and output of the left and right matrices are of the int8_t type. In this scenario, when the output matrix C is moved from CO1 to the global memory, dequantization is performed to dequantize the final result to the corresponding half or int8_t type.

Matmul quantization scenario: During Matmul computation, the input of the left and right matrices is of the half or bfloat16_t type, and the output is of the int8_t type. In this scenario, when the output matrix C is moved from CO1 to the global memory, quantization is performed to quantize the final result to the int8_t type.

There are two quantization or dequantization modes: quantization or dequantization of the same coefficient and vector quantization or dequantization.

Quantization or dequantization of the same coefficient: All values of the output matrix are quantized or dequantized using the same coefficient.
Vector quantization or dequantization: A parameter vector is provided, and each column of the output matrix is quantized or dequantized using the coefficient of the corresponding column in the vector.

Prototype

int32_t SetDequantType(DequantType dequantType)

Parameters

Table 1 Parameters

Parameter

Input/Output

Description

dequantType

Input

Sets the quantization or dequantization mode. dequantType is defined as follows:

enum class DequantType {
    SCALAR = 0,
    TENSOR = 1,
};

The values and meanings of the parameter are as follows:

SCALAR: quantization or dequantization of the same coefficient
TENSOR: vector quantization or dequantization

Returns

-1: setting failed; 0: setting succeeded.

Restrictions

The quantization or dequantization of the same coefficient and the vector quantization or dequantization supported by this API correspond to the kernel APIs SetQuantScalar and SetQuantVector, respectively. The quantization or dequantization mode set by this API must be the same as that used by the kernel API.

Example

auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8);
tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8);   
tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32);   
tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32);   
tiling.SetShape(M, N, K);   
tiling.SetOrgShape(M, N, K);
tiling.EnableBias(true);
tiling.SetDequantType(DequantType::SCALAR); // Set the quantization or dequantization of the same coefficient.
// tiling.SetDequantType(DequantType::TENSOR); //: Set the vector quantization or dequantization.
tiling.SetBufferSpace(-1, -1, -1);
optiling::TCubeTiling tilingData;   
int ret = tiling.GetTiling(tilingData);

Parent topic: Matmul Tiling