Accuracy Loss Issues of Inference Quantization

Prerequisites

You need to be familiar with the functions and basic operations of msModelSlim quantization and compression tool.

Issues

When deploying a large model on the Ascend platform, you need to pay attention to the balance between model inference performance and accuracy. As an effective method to improve inference efficiency, quantization can significantly accelerate model inference. However, the quantization process faces issues such as difficult quantization of activation values and outliers, as well as outlier error accumulation. For details, see Table 1.

**Table 1** Issues in quantization scenarios
Scenario	Quantization Difficulty
Activation value quantization	Activation values are dynamically generated, with wide distribution range and many outliers. Direct quantization may lead to loss of key features.
Outlier quantization	Outliers are difficult to define, and the following issues exist during quantization: If outliers are included in the quantization range, non-outliers are compressed into a narrow quantization range, resulting in accuracy loss. If outliers are not included in the quantization range, they are clipped to the minimum or maximum value of the integer range after quantization, losing information and causing accuracy loss.
Outlier error accumulation	Outliers are clustered in specific channels (for example, the output channels of the Transformer attention layer), causing local quantization errors to propagate layer by layer and impacting global accuracy.

To solve these issues, you can use the outlier suppression algorithm of msModelSlim quantization and compression tool, combined with accuracy tuning, to quickly mitigate the accuracy loss caused by quantization. This maintains model accuracy and stability in real-world applications.