convert_qat_model
Function Usage
Converts an ONNX QAT model to a model of the Ascend format.
Constraints
- When the QuantizeLinear operator is not the output, the source QAT model must have FakeQuant layers (including QuantizeLinear and DequantizeLinear). Channel-wise quantization takes effect on weights only. A QuantizeLinear-DequantizeLinear layer pair must have the same quantization factors.
- When the QuantizeLinear operator is the non-middle-layer output and is the only output, the QuantizeLinear operator does not need to be paired with the DequantizeLinear operator during model adaptation and is replaced with the AscendQuant operator.
The offset value in the original ONNX model is stored in the int32 type. During operator replacement, the offset value may exceed the int8 range. However, during actual computation, both ONNX Runtime and AMCT validate the offset, without affecting the adaptation process and result.
Prototype
convert_qat_model(model_file, save_path, record_file=None)
Command-Line Options
Parameter |
Input/Return |
Meaning |
Restriction |
|---|---|---|---|
model_file |
Input |
Path of the .onnx model file to convert. |
A string |
save_path |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. |
A string |
record_file |
Input |
Path of the quantization factor record file (.txt) computed by the user. |
A string Default: None |
Returns
None
Outputs
- A fake-quantized model for testing on the CPU/GPU and a deployable model convertible by ATC.
- (Optional) A quantization factor record file (.txt), which records the quantization factors of each quantizable layer.
Calling Example
1 2 3 4 | import amct_onnx as amct model_file = "./pre_model/mobilenet_v2_qat.onnx" save_path="./results/model" amct.convert_qat_model(model_file, save_path) |
Parent topic: Model Adaptation APIs