save_prune_retrain_model

Function Usage

Generates the final ONNX simulation model and deployment model based on the retrained sparse model.

Constraints

For sparse models, the two files generated by this API are the ONNX files exported using PyTorch. The file content is the same, and the file names contain the deploy and fake_quant keywords.

Prototype

save_prune_retrain_model(model, save_path, input_data, input_names=None, output_names=None, dynamic_axes=None)

Parameters

Option

Input/Return

Meaning

Restriction

model

Input

PyTorch model that has been sparsified.

A torch.nn.Module.

save_path

Input

Path for storing the compressed model. Must include the prefix of the model name, for example, ./quantized_model/*model.

A string

input_data

Input

Input data of the model. A torch.tensor, equivalent to a tuple(torch.tensor).

A tuple.

input_names

Input

Model input name, which is displayed in the saved sparse ONNX model.

Default: None

A list of strings.

output_names

Input

Model output name, which is displayed in the saved sparse ONNX model.

Default: None

A list of strings.

dynamic_axes

Input

Dynamic axes of the model inputs and outputs. For example, if the inputs have format NCHW, where N, H and W are uncertain, and the outputs have format NL, where N is uncertain, then pass:

{ "inputs": [0,2,3], "outputs": [0]}, where 0, 2, and 3 indicate the indexes of N, H, and W, respectively.

Default: None

A dict<string, dict<python:int, string>>, or dict<string, list(int)>

Returns

None

Outputs

  • A fake-quantized ONNX model for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
  • A deployable ONNX model with the file name containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
  • (Optional) External data file. The sparsity feature does not distinguish the *deploy.external and *fakequant.external files.

    Only the size of the saved precision simulation model and deployed model file is available. This type of file is generated only when the file size is 2GB and is generated in the same directory as the compressed *.onnx file. It is used to store the data in the tensor. Each tensor data is stored in a separate file with the same file name as the tensor, for example, conv_1.weight.

    When the ATC tool is used to load the compressed *.onnx deployment model file for model conversion, the tensor data in the external data file in the same directory is automatically read.

When distillation is performed again, the preceding files output by this API will be overwritten.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import amct_pytorch as amct
# Build a graph of the network to be quantized.
model = build_model()

# create selective prune model

# Train the retrained model to calculate quantization factors.
train(pruned_retrain_model)
infer(pruned_retrain_model)

input_data = tuple([torch.randn(input_shape)])
save_path = os.path.join(OUTPUTS_DIR, 'custom_name')

# Insert the API for saving the compressed model and convert it into an ONNX file.
amct.save_prune_retrain_model(
     pruned_retrain_model,
     save_path,
     input_data,
     input_names=['input'],
     output_names=['output'],
     dynamic_axes={'input':{0:'batch_size'}, 'output':{0:'batch_size'}})