convert_qat_model

Function Usage

Converts an ONNX QAT model to a model of the Ascend format.

Constraints

  • When the QuantizeLinear operator is not the output, the source QAT model must have FakeQuant layers (including QuantizeLinear and DequantizeLinear). Channel-wise quantization takes effect on weights only. A QuantizeLinear-DequantizeLinear layer pair must have the same quantization factors.
  • When the QuantizeLinear operator is the non-middle-layer output and is the only output, the QuantizeLinear operator does not need to be paired with the DequantizeLinear operator during model adaptation and is replaced with the AscendQuant operator.

    The offset value in the original ONNX model is stored in the int32 type. During operator replacement, the offset value may exceed the int8 range. However, during actual computation, both ONNX Runtime and AMCT validate the offset, without affecting the adaptation process and result.

Prototype

convert_qat_model(model_file, save_path, record_file=None)

Command-Line Options

Parameter

Input/Return

Meaning

Restriction

model_file

Input

Path of the .onnx model file to convert.

A string

save_path

Input

Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string

record_file

Input

Path of the quantization factor record file (.txt) computed by the user.

A string

Default: None

Returns

None

Outputs

  • A fake-quantized model for testing on the CPU/GPU and a deployable model convertible by ATC.
  • (Optional) A quantization factor record file (.txt), which records the quantization factors of each quantizable layer.

Calling Example

1
2
3
4
import amct_onnx as amct
model_file = "./pre_model/mobilenet_v2_qat.onnx"
save_path="./results/model"
amct.convert_qat_model(model_file, save_path)