save_quant_retrain_model
Function Usage
Inserts operators such as AscendQuant and AscendDequant into the retrained model to generate a model for accuracy simulation and a model for deployment.
Prototype
save_quant_retrain_model (config_file, model, record_file, save_path, input_data, input_names=None, output_names=None, dynamic_axes=None)
Command-Line Options
Option |
Input/Return |
Meaning |
Restriction |
|---|---|---|---|
config_file |
Input |
User-defined QAT configuration file, which specifies the configuration of each layer to be quantized. |
A string |
model |
Input |
Model generated after QAT. |
A torch.nn.Module |
record_file |
Input |
Directory of the quantization factor record file, including the file name. |
A string |
save_path |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. |
A string |
input_data |
Input |
Input data of the model. A torch.tensor, equivalent to a tuple(torch.tensor). |
A tuple. |
input_names |
Input |
Input names of the result ONNX model. |
Default: None A list of strings. |
output_names |
Input |
Output names of the result ONNX model. |
Default: None A list of strings. |
dynamic_axes |
Input |
Dynamic axes of the model inputs and outputs. For example, if the inputs have format NCHW, where N, H and W are uncertain, and the outputs have format NL, where N is uncertain, then pass: {"inputs": [0,2,3], "outputs": [0]}, where 0, 2, and 3 indicate the indexes of N, H, and W, respectively. |
Default: None A dict<string, dict<python:int, string>>, or dict<string, list(int)> Constraint: The value of the int type must be a non-negative number. |
Returns
None
Outputs
- A fake-quantized ONNX model for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
- A deployable ONNX model with the file name containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
- (Optional) *.external files, including *deploy.external and *fakequant.external:
Only the size of the saved precision simulation model and deployed model file is available. This type of file is generated only when the file size is 2GB and is generated in the same directory as the compressed *.onnx model file. This type of file is used to store tensor data. Each tensor data is stored in a separate *.external file with the same file name as the tensor, for example, conv1.weight_deploy.external and conv1.weight_fakequant.external.
When the ATC tool is used to load the compressed *.onnx deployment model file for model conversion, the tensor data in the *.external file in the same directory is automatically read.
When QAT is performed again, this API will overwrite the existing files in the output directory.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import amct_pytorch as amct # Build a graph of the network to be quantized. model = build_model() model.load_state_dict(torch.load(state_dict_path)) input_data = tuple([torch.randn(input_shape)]) # Train the retrained model to calculate quantization factors. train_model(quant_retrain_model, input_batch) # Run inference on the retrained model to export the quantization factors. infer_model(quant_retrain_model, input_batch) # Insert the quantization API and save the quantization-aware trained model as an ONNX file. amct.save_quant_retrain_model( config_json_file, model, record_file, input_data, input_names=['input'], output_names=['output'], dynamic_axes={'input':{0: 'batch_size'}, 'output':{0: 'batch_size'}}) |