accuracy_based_auto_calibration
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
√ |
Description
Calibrates the input model based on the input configuration file to search for a quantization configuration that meets accuracy requirements, and outputs a fake-quantized model for accuracy simulation in the ONNX Runtime environment and a model deployable on the Ascend AI Processor for inference.
Prototype
1 | accuracy_based_auto_calibration(model,model_evaluator,config_file,record_file,save_dir,input_data,input_names,output_names,dynamic_axes,strategy='BinarySearch',sensitivity='CosineSimilarity') |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
model |
Input |
Original torch model. A torch.nn.Module. |
model_evaluator |
Input |
Python instance for automatic quantization calibration and accuracy evaluation. A Python instance. |
config_file |
Input |
Quantization configuration file generated by the user. A string. |
record_file |
Input |
Path of the quantization factor record file. The existing file (if any) in the path will be overwritten. A string. |
save_dir |
Input |
Model save path. Must include the prefix of the model name, for example, ./quantized_model/*model. A string. |
input_data |
Input |
Input data of the model. A torch.tensor is replaced with an equivalent tuple(torch.tensor). A tuple. |
input_names |
Input |
Input names of the model, which are displayed in modified_onnx_file. Default: None A list of strings. |
output_names |
Input |
Output names of the model, which are displayed in modified_onnx_file. Default: None A list of strings. |
dynamic_axes |
Input |
Dynamic axes of the model inputs and outputs. For example, if an input has format NCHW, where N, H and W are dynamic, and an output has format NL, where N is dynamic, then: {"inputs": [0,2,3], "outputs": [0]}. Default: None A dict<string, dict<python:int, string>>, or dict<string, list(int)>. |
strategy |
Input |
Policy for searching for the quantization configuration that meets the accuracy requirements. The dichotomy policy is used by default. A string or a Python instance. Default: BinarySearch |
sensitivity |
Input |
Metric used to evaluate how quantization-sensible each layer to be quantized is. By default, the cosine similarity metric is used. A string or a Python instance. Default: CosineSimilarity |
Returns
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | import amct_pytorch as amct from amct_pytorch.common.auto_calibration import AutoCalibrationEvaluatorBase # You need to implement the AutoCalibrationEvaluator's calibration(), evaluate() and metric_eval() funcs class AutoCalibrationEvaluator(AutoCalibrationEvaluatorBase): """ subclass of AutoCalibrationEvaluatorBase""" def __init__(self, target_loss, batch_num): super(AutoCalibrationEvaluator, self).__init__() self.target_loss = target_loss self.batch_num = batch_num def calibration(self, model): """ implement the calibration function of AutoCalibrationEvaluatorBase calibration() need to finish the calibration inference procedure so the inference batch num need to >= the batch_num pass to create_quant_config """ model_forward(model=model, batch_size=32, iterations=self.batch_num) def evaluate(self, model): """ implement the evaluate function of AutoCalibrationEvaluatorBase params: model in torch.nn.module return: the accuracy of input model on the eval dataset, or other metric which can describe the 'accuracy' of model """ top1, _ = model_forward(model=model, batch_size=32, iterations=5) if torch.cuda.is_available(): torch.cuda.empty_cache() return top1 def metric_eval(self, original_metric, new_metric): """ implement the metric_eval function of AutoCalibrationEvaluatorBase params: original_metric: the returned accuracy of evaluate() on non quantized model new_metric: the returned accuracy of evaluate() on fake quant model return: [0]: whether the accuracy loss between non quantized model and fake quant model can satisfy the requirement [1]: the accuracy loss between non quantized model and fake quant model """ loss = original_metric - new_metric if loss * 100 < self.target_loss: return True, loss return False, loss ... # 1. step1 create quant config json file config_json_file = os.path.join(TMP, 'config.json') skip_layers = [] batch_num = 2 amct.create_quant_config( config_json_file, model, input_data, skip_layers, batch_num ) # 2. step2 construct the instance of AutoCalibrationEvaluator evaluator = AutoCalibrationEvaluator(target_loss=0.5, batch_num=batch_num) # 3. step3 using accuracy_based_auto_calibration to quantize the model record_file = os.path.join(TMP, 'scale_offset_record.txt') result_path = os.path.join(PATH, 'result/mobilenet_v2') amct.accuracy_based_auto_calibration( model=model, model_evaluator=evaluator, config_file=config_json_file, record_file=record_file, save_dir=result_path, input_data=input_data, input_names=['input'], output_names=['output'], dynamic_axes={ 'input': {0: 'batch_size'}, 'output': {0: 'batch_size'} }, strategy='BinarySearch', sensitivity='CosineSimilarity' ) |
Flush files:
- A fake-quantized ONNX model file for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
- A deployable ONNX model file with the file name containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
- A quantization factor record file (record_file) where quantization factors are written.
- A sensitivity file that records how quantization-sensible is each layer, based on which the layers to be unquantized are determined.
- An automatic unquantization history file that records the layers to be unquantized.