save_model

Applicability

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	x
Atlas A2 training products/Atlas A2 inference products	x
Atlas 200I/500 A2 inference product	√
Atlas inference series products	√
Atlas training products	√

Description

Inserts operators such as AscendQuant and AscendDequant into the modified graph, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on the Ascend AI Processor for inference.

Prototype

save_model(graph, save_type, save_path)

Parameters

Parameter	Input/Output	Description
graph	Input	Graph structure modified by the quantize_model API. An AMCT-defined Graph.
save_type	Input	Type of the model to be saved. Fakequant: a model for accuracy simulation. Deploy: a deployable model that can be used for inference on the Ascend AI Processor. Both: both models. A string.
save_path	Input	Model save path. Must include the prefix of the model name, for example, *./quantized_model/model**. A string.

Parameter

Input/Output

Description

graph

Input

Graph structure modified by the quantize_model API.

An AMCT-defined Graph.

save_type

Input

Type of the model to be saved.

Fakequant: a model for accuracy simulation.
Deploy: a deployable model that can be used for inference on the Ascend AI Processor.
Both: both models.

A string.

save_path

Input

Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string.

Returns

None

Restrictions

This API can be called only after batch_num forward passes are completed. Failure to do so may lead to incorrect quantization factors and thus unsatisfactory quantization result.

Due to data type conversion, the quantization factors (scale and offset) in the generated deployable model might be different from the computation result. However, the accuracy is not affected.
scale_offset_record_file must contain the quantization factors of all quantization layers. Otherwise, an error is reported. That is, modified_model_file and modified_weights_file in quantize_model must complete batch_num forward passes in the Caffe environment.

Example

from amct_caffe import save_model

# In the Caffe environment, perform batch_num forward passes on the modified model for quantization.
run_caffe_model(modified_model_file, modified_weights_file, batch_num)

# Insert this API, and save the quantized model to a .prototxt model file and a .caffemodel weight file. The following files can be found in the ./quantized_model folder: model_fake_quant_model.prototxt, model_fake_quant_weights.caffemodel, model_deploy_model.prototxt, model_deploy_weights.caffemodel, and model_quant.json.
save_model(graph=graph,
           save_type="Both",
           save_path="./quantized_model/model")

Flush files:

A fake-quantized model file for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
A deployable model file and its weight file, with names containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.

When quantization is performed again, the preceding files output by the API will be overwritten.

Parent topic: PTQ APIs