convert_qat_model

Applicability

Product

Supported

Atlas A3 training series products/Atlas A3 inference series products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference product

Atlas inference series products

Atlas training products

Description

Converts an ONNX QAT model to the CANN format.

Restrictions

  • If the QuantizeLinear operator is not the output, only the QAT model that contains the QuantizeLinear and DequantizeLinear FakeQuant layers can be adapted, and per-channel quantization is supported only by weights. The QuantizeLinear and DequantizeLinear layers in pairs must have the same quantization factor.
  • When the QuantizeLinear operator is a non-middle-layer output and is the only output, the QuantizeLinear operator does not need to be paired with the DequantizeLinear operator during model adaptation and is replaced with the AscendQuant operator.

    The offset value in the original ONNX model is stored in the INT32 type. During operator replacement, the offset value may exceed the INT8 range. However, during actual computation, both ONNX Runtime and AMCT validate the offset, without affecting the adaptation process and result.

Prototype

1
convert_qat_model(model_file, save_path, record_file=None)

Parameters

Parameter

Input/Output

Description

model_file

Input

Path of the .onnx model file to be adapted.

A string.

save_path

Input

Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string.

record_file

Input

Path of the quantization factor record file (.txt) computed by the user.

A string.

Default: None

Returns

None

Example

1
2
3
4
import amct_onnx as amct
model_file = "./pre_model/mobilenet_v2_qat.onnx"
save_path="./results/model"
amct.convert_qat_model(model_file, save_path)

Flush files:

  • A fake-quantized model file for testing on the CPU/GPU and a deployable model convertible by ATC.
  • (Optional) A quantization factor record file (.txt), which records the quantization factors of each quantizable layer.