accuracy_based_auto_calibration
Function Usage
Calibrates the input model based on the input configuration file, searches for a quantization configuration that meets accuracy requirements, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on Ascend AI Processor for inference.
Restrictions
None
Prototype
accuracy_based_auto_calibration(model_file,weights_file,model_evaluator,config_file,record_file,save_dir,strategy='BinarySearch',sensitivity='CosineSimilarity')
Command-Line Options
Option |
Input/Return |
Description |
Restriction |
|---|---|---|---|
model_file |
Input |
Definition file of the Caffe model (.prototxt). |
A string |
weights_file |
Input |
Weight file of the Caffe model (.caffemodel). |
A string |
model_evaluator |
Input |
Python instance for automatic quantization calibration and accuracy evaluation. |
Data type: Python instance |
config_file |
Input |
Quantization configuration file generated by the user. |
A string |
record_file |
Input |
Quantization factor record file. The existing file (if any) in the path will be overwritten upon this API call. |
A string |
save_dir |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. |
A string |
strategy |
Input |
Policy for searching for the quantization configuration that meets the accuracy requirements. The dichotomy policy is used by default. |
Data type: string or Python instance Default value: BinarySearch |
sensitivity |
Input |
Metric used to evaluate how quantization-sensible each layer to be quantized is. By default, the cosine similarity metric is used. |
Data type: string or Python instance Default value: CosineSimilarity |
Returns
None
Outputs
- A fake-quantized model for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
- A deployable model and its weight file, with names containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
- A quantization factor record file (record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
- A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.
- A sensitivity file that records how quantization-sensible is each layer, based on which the layers to be unquantized are determined.
- Automatic quantization rollback history file: records the information about the rollback layer.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | import amct_caffe as amct from amct_caffe.common.auto_calibration import AutoCalibrationEvaluatorBase from amct_caffe.common.auto_calibration import BinarySearchStrategy from amct_caffe.common.auto_calibration import CosineSimilaritySensitivity class AutoCalibrationEvaluator(AutoCalibrationEvaluatorBase): def __init__(self): """ evaluate_batch_num is the needed batch num for evaluating the model. Larger evaluate_batch_num is recommended, because the evaluation metric of input model can be more precise with larger eval dataset. """ super().__init__() def calibration(self, model_file, weights_file): """" Function: do the calibration with model Parameter: model_file: the prototxt model define file of caffe model weights_file: the binary caffemodel file of caffe model """ run_caffe_model(args, model_file, weights_file, CALIBRATION_BATCH_NUM) def evaluate(self, model_file, weights_file): """" Function: evaluate the model with batch_num of data, return the eval metric of the input model, such as top1 for classification model, mAP for detection model and so on. Parameter: model_file: the prototxt model define file of caffe model weights_file: the binary caffemodel file of caffe model """ return do_benchmark_test(args, model_file, weights_file, args.iterations) def metric_eval(self, original_metric, new_metric): """ Function: whether the metric of new fake quant model can satisfy the requirement Parameter: original_metric: the metric of non quantized model new_metric: the metric of new quantized model """ # the loss of top1 acc need to be less than 0.2% loss = original_metric - new_metric if loss * 100 < 0.2: return True, loss return False, loss # step 1: create the quant config file config_json_file = './config.json' skip_layers = [] batch_num = CALIBRATION_BATCH_NUM activation_offset = True amct.create_quant_config(config_json_file, model_file, weights_file, skip_layers, batch_num, activation_offset) scale_offset_record_file = os.path.join(TMP, 'scale_offset_record.txt') result_path = os.path.join(RESULT, 'MobileNetV2') evaluator = AutoCalibrationEvaluator() # step 2: start the accuracy_based_auto_calibration process amct.accuracy_based_auto_calibration( args.model_file, args.weights_file, evaluator, config_json_file, scale_offset_record_file, result_path) |