save_model

Applicability

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference product	√
Atlas inference series products	√
Atlas training products	√

Description

Inserts operators such as AscendQuant and AscendDequant into the modified model based on the quantization factor record file record_file and generates a fake-quantized model for accuracy simulation in the ONNX Runtime environment and a model deployable on the Ascend AI Processor for inference.

Restrictions

This API can be called only after batch_num forward passes are completed. Failure to do so may lead to incorrect quantization factors and thus unsatisfactory quantization result.
This API receives only the ONNX model file returned by the quantize_model API.
This API requires the input of a quantization factor record file, which is generated in the quantize_model phase and has its factor values filled in the model inference phase.

Prototype

save_model(modified_onnx_file, record_file, save_path)

Parameters

Parameter	Input/Output	Description
modified_onnx_file	Input	File name of the resultant ONNX model, which is output by the quantize_model API. A string.
record_file	Input	Path (including the file name) of the quantization factor record file. A string.
save_path	Input	Model save path. Must include the prefix of the model name, for example, *./quantized_model/model**. A string.

Parameter

Input/Output

Description

modified_onnx_file

Input

File name of the resultant ONNX model, which is output by the quantize_model API.

A string.

record_file

Input

Path (including the file name) of the quantization factor record file.

A string.

save_path

Input

Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string.

Returns

None

Example

import amct_onnx as amct
# Perform network inference and complete quantization during the inference.
# The model ready for calibration generated by the quantize_model API call contains the new AMCT custom operators. As such, make sure to contain SessionOptions provided by AMCT in InferenceSession of ONNX Runtime created for inference with the calibration dataset.
for i in batch_num:
    onnxruntime.InferenceSession(onnx_model, amct.AMCT_SO).run(None, {'input':input_batch})

# Insert the API call and save the quantized model as an ONNX file.
amct.save_model(modified_onnx_file="./tmp/modified_model.onnx",
                record_file="./scale_offset_record_file.txt",
                save_path="./results/model")

Flush files:

A fake-quantized model file for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
A deployable model file with the file name containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
(Optional) *.external files, including *deploy.external and *fakequant.external:
- If the original model is a non-separate network (excluding external data), this type of file is generated only when the size of the saved fake-quantized model and deployable model file is greater than or equal to 2 GB. The *.external file is generated in the same directory as the compressed *.onnx model file and is used to save the data in the tensor. Each tensor data is saved in a separate .external file. The file name is the same as the tensor name, for example, conv1.weight_deploy.external and conv1.weight_fakequant.external.
- If the original model is a separate network (including external data), tensor data is saved separately regardless of whether the size of the compressed model file is greater than or equal to 2 GB. That is, the *.external file is always generated (constant nodes cannot be dumped as external_data).
When ATC is used to load the compressed *.onnx deployable model file for model conversion, the tensor data in the *.external file in the same directory is automatically read.

When quantization is performed again, the preceding files output by the API will be overwritten.

Parent topic: PTQ APIs