Model Compression and Quantization

Quantization can compress a model and reduce the computation workload.

  • Ascend supports quantization only for Cube operators (MatMul and Conv).
  • During quantization, some data conversion operators are inserted, which may cause performance deterioration. If quantization is required, you are advised to use methods such as AOE for optimization after quantization and compare the performance before and after quantization. For details about the AOE method, see Solution for ONNX Offline Inference.

The quantization methods are as follows:

  • ATC-based quantization: uses the --compression_optimize_conf parameter during ATC conversion to directly obtain the quantized OM file. For details, see ""Command-Line Options"" in ATC Instructions.
  • AMCT_ONNX: quantizes ONNX models. You need to download and install AMCT (ONNX), which is equivalent to the ONNX version of ATC parameter quantization. The AMCT tool can be obtained from the CANN software download link. AMCT supports joint quantization, which may improve the performance of the ResNet structure.
  • msModelSlim: quantizes ONNX models. It is a tool provided by the CANN package and does not need to be installed. It supports the quantization of ONNX models larger than 2 GB. For details, see msModelSlim.