save_model

Function Usage

Inserts operators such as AscendQuant and AscendDequant based on the quantization factor record file record_file and the modified model, and saves the operators as a fake_quant model that can be used for accuracy simulation in the ONNX Runtime environment and a deployable model that can be used for inference.

Constraints

This API can be called only after batch_num forward passes are completed. Failure to do so may lead to incorrect quantization factors and thus unsatisfactory quantization result.
This API receives only the ONNX model file returned by quantize_model.
This API requires the input of a quantization factor record file, which is generated in the quantize_model phase and has its factor values filled in the model inference phase.

Prototype

save_model(modified_onnx_file, record_file, save_path)

Command-Line Options

Option	Input/Return	Description	Restriction
modified_onnx_file	Input	Name of the modified ONNX model file that is output by quantize_model.	A string
record_file	Input	Directory of the quantization factor record file, including the file name.	A string
save_path	Input	Model save path. Must include the prefix of the model name, for example, *./quantized_model/model**.	A string

Option

Input/Return

Description

Restriction

modified_onnx_file

Input

Name of the modified ONNX model file that is output by quantize_model.

A string

record_file

Input

Directory of the quantization factor record file, including the file name.

A string

save_path

Input

Model save path.

Must include the prefix of the model name, for example, ./quantized_model/*model.

A string

Returns

None

Outputs

A fake-quantized model for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
A deployable model with the file name containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
(Optional) *.external files, including *deploy.external and *fakequant.external:
- If the original model is a non-separated network (excluding external data), only the size of the saved accuracy simulation model and the size of the deployed model are available. This type of file is generated only when the file size is 2GB and is generated in the same directory as the compressed *.onnx model file. This type of file is used to store tensor data. Each tensor data is stored in a separate *.external file with the same file name as the tensor, for example, conv1.weight_deploy.external and conv1.weight_fakequant.external.
- If the original model is a data model separation network (including external data), the tensor data is saved separately regardless of whether the size of the compressed model file is greater than or equal to 2GB. That is, the *.external file is always generated. (The constant node cannot be dumped to external_data.)
When the ATC tool is used to load the compressed *.onnx deployment model file for model conversion, the tensor data in the *.external file in the same directory is automatically read.

When quantization is performed again, the preceding files output by the API will be overwritten.

Example

import amct_onnx as amct
# Perform network inference and complete quantization during the inference.
# The model ready for calibration generated by the quantize_model API call contains the newly added AMCT custom operators. Therefore, make sure to include SessionOptions provided by AMCT in the InferenceSession of ONNX Runtime created for inference with the calibration dataset.
for i in batch_num:
    onnxruntime.InferenceSession(onnx_model, amct.AMCT_SO).run(None, {'input':input_batch})

# Insert the API call and save the quantized model as an ONNX file.
amct.save_model(modified_onnx_file="./tmp/modified_model.onnx",
                record_file="./scale_offset_record_file.txt",
                save_path="./results/model")

Parent topic: PTQ APIs