auto_nuq
Function Usage
Performs automatic NUQ on a model based on the input configuration file, searches for an NUQ configuration that meets the accuracy requirement, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on Ascend AI Processor for online inference.
Prototype
auto_nuq(model_file, weights_file, nuq_evaluator, config_file, scale_offset_record_file, save_dir)
Command-Line Options
Option |
Input/Return |
Description |
Restriction |
|---|---|---|---|
model_file |
Input |
Definition file of the Caffe model (.prototxt). |
A string |
weights_file |
Input |
Weight file of the Caffe model (.caffemodel). |
A string |
nuq_evaluator |
Input |
Python instance for automatic non-uniform quantization evaluation. |
Data type: Python instance |
config_file |
Input |
Quantization configuration file generated by the user. |
A string |
scale_offset_record_file |
Input |
File for storing quantization factors. If the file exists, it will be overwritten. |
A string |
save_dir |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. |
A string |
Returns
None
Outputs
- A fake-quantized model for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
- A deployable model and its weight file, with names containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
- A quantization factor record file (record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
- Non-uniform quantization information record file: This file records the layers on which non-uniform quantization is performed.
- A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.
When quantization is performed again, the preceding files output by the API will be overwritten.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import amct_caffe as amct from amct_caffe.auto_nuq import AutoNuqEvaluatorBase class AutoNuqEvaluator(AutoNuqEvaluatorBase): def __init__(self, evaluate_batch_num): self.evaluate_batch_num = evaluate_batch_num def eval_model(self, model_file, weights_file, batch_num): return do_benchmark_test(args, model_file, weights_file, batch_num) def is_satisfied(self, original_metric, new_metric): # the loss of top1 acc need to be less than 1% if (original_metric - new_metric) *100<1: return True return False evaluator = AutoNuqEvaluator(1000) amct.auto_nuq( model_file, weights_file, evaluator, config_json_file, scale_offset_record_file, './results/Resnet50') |