Quantization Algorithm Principles

Common quantization algorithms include binary quantization, linear quantization, and logarithmic quantization. Linear quantization can be classified into symmetric quantization and asymmetric quantization based on whether offset exists. AMCT works with Ascend SoCs in linear quantization mode. Taking INT8 quantization as an example, the symmetric and asymmetric quantization modes are normalized as follows:

For the quantization layer data and weight, the quantization factor scale (scale factor of the floating point number) and offset (offset) need to be provided. The supported value ranges are as follows:

The following describes the origin of the preceding expressions.

Symmetric quantization algorithm:

The relationship between original high-precision data and quantized int8 data can be expressed as , where scale is a float32. To indicate positive and negative numbers, the signed int8 data type is used for . The following describes how to convert original data into int8 format. round is a rounding function. The value to be determined by the quantization algorithm is the constant scale.

The quantization of weights and activations may be summarized as a process of searching for a scale. Because is a signed number, to ensure symmetry of the ranges represented by positive and negative values, an absolute value operation is first performed on all data. This changes the range of the to-be-quantized data to , and then scale is determined. The range of positive int8 values is [0, 127]. Therefore, scale can be computed as follows:

The result range of the int8 values is . Data beyond the range is saturated to a boundary value, and then the quantization operation shown in the formula is performed.

Asymmetric quantization algorithm:

Compared with the symmetric quantization algorithm, this one uses a different data conversion technique. Plus, the scale and offset constants also need to be determined.

The UINT8 data is converted based on the original high-precision data, as shown in the following formula:

scale is a float32 number, is an unsigned int8 number, and offset is an int8 number. The data range is . If a value range of the to-be-quantized data is , scale and offset are computed as follows:

,

Normalized Quantization Data Format

AMCT uses a normalized quantization data format.

By performing simple data conversion of the asymmetric quantization algorithm formula, the quantized data and the symmetric quantization algorithm are of the same type, int. The following shows the conversion process

The following shows the conversion process and uses int8 quantization as an example. The input original floating-point data is , the original quantized fixed-point number is , the quantization scale is scale, and the quantization offset is (the algorithm requires zero crossing to prevent accuracy drop). The calculation principle of quantization is as follows:

where . The above conversion allows the data to be converted into the int8 format. After scale and the converted offset' are determined, the int8 data converted from the original floating-point data is as follows: