convert_qat_model
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
√ |
Description
Converts an ONNX QAT model to the CANN format.
Restrictions
- If the QuantizeLinear operator is not the output, only the QAT model that contains the QuantizeLinear and DequantizeLinear FakeQuant layers can be adapted, and per-channel quantization is supported only by weights. The QuantizeLinear and DequantizeLinear layers in pairs must have the same quantization factor.
- When the QuantizeLinear operator is a non-middle-layer output and is the only output, the QuantizeLinear operator does not need to be paired with the DequantizeLinear operator during model adaptation and is replaced with the AscendQuant operator.
The offset value in the original ONNX model is stored in the INT32 type. During operator replacement, the offset value may exceed the INT8 range. However, during actual computation, both ONNX Runtime and AMCT validate the offset, without affecting the adaptation process and result.
Prototype
1 | convert_qat_model(model_file, save_path, record_file=None) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
model_file |
Input |
Path of the .onnx model file to be adapted. A string. |
save_path |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. A string. |
record_file |
Input |
Path of the quantization factor record file (.txt) computed by the user. A string. Default: None |
Returns
None
Example
1 2 3 4 | import amct_onnx as amct model_file = "./pre_model/mobilenet_v2_qat.onnx" save_path="./results/model" amct.convert_qat_model(model_file, save_path) |
Flush files:
- A fake-quantized model file for testing on the CPU/GPU and a deployable model convertible by ATC.
- (Optional) A quantization factor record file (.txt), which records the quantization factors of each quantizable layer.