convert_qat_model

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference product	√
Atlas inference series products	√
Atlas training products	√

Converts an ONNX QAT model to the CANN format.

If the QuantizeLinear operator is not the output, only the QAT model that contains the QuantizeLinear and DequantizeLinear FakeQuant layers can be adapted, and per-channel quantization is supported only by weights. The QuantizeLinear and DequantizeLinear layers in pairs must have the same quantization factor.
When the QuantizeLinear operator is a non-middle-layer output and is the only output, the QuantizeLinear operator does not need to be paired with the DequantizeLinear operator during model adaptation and is replaced with the AscendQuant operator.

The offset value in the original ONNX model is stored in the INT32 type. During operator replacement, the offset value may exceed the INT8 range. However, during actual computation, both ONNX Runtime and AMCT validate the offset, without affecting the adaptation process and result.

convert_qat_model(model_file, save_path, record_file=None)

Parameter	Input/Output	Description
model_file	Input	Path of the .onnx model file to be adapted. A string.
save_path	Input	Model save path. Must include the prefix of the model name, for example, **./quantized_model/model***. A string.
record_file	Input	Path of the quantization factor record file (.txt) computed by the user. A string. Default: None

Parameter

Input/Output

Description

model_file

Input

Path of the .onnx model file to be adapted.

A string.

save_path

Input

Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string.

record_file

Input

Path of the quantization factor record file (.txt) computed by the user.

A string.

Default: None

None

import amct_onnx as amct
model_file = "./pre_model/mobilenet_v2_qat.onnx"
save_path="./results/model"
amct.convert_qat_model(model_file, save_path)

Flush files:

A fake-quantized model file for testing on the CPU/GPU and a deployable model convertible by ATC.
(Optional) A quantization factor record file (.txt), which records the quantization factors of each quantizable layer.

Parent topic: Model Adaptation API