save_model
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
√ |
Description
Inserts operators such as AscendQuant and AscendDequant into the modified model based on the quantization factor record file record_file and generates a fake-quantized model for accuracy simulation in the ONNX Runtime environment and a model deployable on the Ascend AI Processor for inference.
Prototype
1 | save_model(modfied_onnx_file, record_file, save_path) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
modfied_onnx_file |
Input |
Name of the resultant ONNX model file. A string. |
record_file |
Input |
Path (including the file name) of the quantization factor record file. A string. |
save_path |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. A string. |
Returns
None
Restrictions
- This API can be called only after batch_num forward passes are completed. Failure to do so may lead to incorrect quantization factors and thus unsatisfactory quantization result.
- This API receives only the ONNX model file returned by the quantize_model API.
- This API requires the input of a quantization factor record file, which is generated in the quantize_model phase and has its factor values filled in the model inference phase.
Example
1 2 3 4 5 6 7 8 9 | import amct_pytorch as amct # Perform network inference and complete quantization during the inference. for i in batch_num: output = calibration_model(input_batch) # Insert the API to save the quantized model as an ONNX file. amct.save_model(modfied_onnx_file="./tmp/modfied_model.onnx", record_file="./tmp/scale_offset_record.txt", save_path="./results/model") |
Flush files:
- A fake-quantized ONNX model file for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
- A deployable ONNX model file with the file name containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
- (Optional) *.external files, including *deploy.external and *fakequant.external:
This type of file is generated only when the size of the saved fake-quantized model and deployable model file is greater than or equal to 2 GB. The *.external file is generated in the same directory as the compressed *.onnx model file and is used to save the data in the tensor. Each tensor data is saved in a separate .external file. The file name is the same as the tensor name, for example, conv1.weight_deploy.external and conv1.weight_fakequant.external.
When ATC is used to load the compressed *.onnx deployable model file for model conversion, the tensor data in the *.external file in the same directory is automatically read.
When quantization is performed again, the preceding files output by the API will be overwritten.