Accuracy Loss Issues of Inference Quantization
Prerequisites
You need to be familiar with the functions and basic operations of msModelSlim quantization and compression tool.
Issues
When deploying a large model on the Ascend platform, you need to pay attention to the balance between model inference performance and accuracy. As an effective method to improve inference efficiency, quantization can significantly accelerate model inference. However, the quantization process faces issues such as difficult quantization of activation values and outliers, as well as outlier error accumulation. For details, see Table 1.
Scenario |
Quantization Difficulty |
|---|---|
Activation value quantization |
Activation values are dynamically generated, with wide distribution range and many outliers. Direct quantization may lead to loss of key features. |
Outlier quantization |
Outliers are difficult to define, and the following issues exist during quantization:
|
Outlier error accumulation |
Outliers are clustered in specific channels (for example, the output channels of the Transformer attention layer), causing local quantization errors to propagate layer by layer and impacting global accuracy. |
To solve these issues, you can use the outlier suppression algorithm of msModelSlim quantization and compression tool, combined with accuracy tuning, to quickly mitigate the accuracy loss caused by quantization. This maintains model accuracy and stability in real-world applications.