auto_nuq
Applicability
Product |
Supported |
|---|---|
x |
|
x |
|
x |
|
x |
|
x |
Note: For the Products marked with x, no error is reported when the API is called, but the performance benefits cannot be obtained.
Description
Performs auto NUQ on a model based on the input configuration file, searches for an NUQ configuration that meets the accuracy requirement, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on the Ascend AI Processor for online inference.
Prototype
1 | auto_nuq(model_file, weights_file, nuq_evaluator, config_file, scale_offset_record_file, save_dir) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
model_file |
Input |
Definition file (.prototxt) of the Caffe model. A string. |
weights_file |
Input |
Weight file (.caffemodel) of the trained Caffe model. A string. |
nuq_evaluator |
Input |
Python instance for auto NUQ evaluation. A Python instance. |
config_file |
Input |
Quantization configuration file generated by the user. A string. |
scale_offset_record_file |
Input |
File for storing quantization factors. The existing file (if any) in the path will be overwritten. A string. |
save_dir |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. A string. |
Returns
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import amct_caffe as amct from amct_caffe.auto_nuq import AutoNuqEvaluatorBase class AutoNuqEvaluator(AutoNuqEvaluatorBase): def __init__(self, evaluate_batch_num): self.evaluate_batch_num = evaluate_batch_num def eval_model(self, model_file, weights_file, batch_num): return do_benchmark_test(args, model_file, weights_file, batch_num) def is_satisfied(self, original_metric, new_metric): # the loss of top1 acc need to be less than 1% if (original_metric - new_metric) *100<1: return True return False evaluator = AutoNuqEvaluator(1000) amct.auto_nuq( model_file, weights_file, evaluator, config_json_file, scale_offset_record_file, './results/Resnet50') |
Flush files:
- A fake-quantized model file for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
- A deployable model file and its weight file, with names containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
- A quantization factor record file (scale_offset_record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
- An NUQ information file that records the layers that are non-uniformly quantized.
- A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.
When quantization is performed again, the preceding files output by the API will be overwritten.