accuracy_based_auto_calibration

Applicability

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference product	√
Atlas inference series products	√
Atlas training products	√

Description

Calibrates the input model based on the input configuration file to search for a quantization configuration that meets accuracy requirements, and outputs a fake-quantized model for accuracy simulation in the ONNX Runtime environment and a model deployable on the Ascend AI Processor for inference.

Prototype

accuracy_based_auto_calibration(model,model_evaluator,config_file,record_file,save_dir,input_data,input_names,output_names,dynamic_axes,strategy='BinarySearch',sensitivity='CosineSimilarity')

Parameters

Parameter	Input/Output	Description
model	Input	Original torch model. A torch.nn.Module.
model_evaluator	Input	Python instance for automatic quantization calibration and accuracy evaluation. A Python instance.
config_file	Input	Quantization configuration file generated by the user. A string.
record_file	Input	Path of the quantization factor record file. The existing file (if any) in the path will be overwritten. A string.
save_dir	Input	Model save path. Must include the prefix of the model name, for example, *./quantized_model/model**. A string.
input_data	Input	Input data of the model. A torch.tensor is replaced with an equivalent tuple(torch.tensor). A tuple.
input_names	Input	Input names of the model, which are displayed in modified_onnx_file. Default: None A list of strings.
output_names	Input	Output names of the model, which are displayed in modified_onnx_file. Default: None A list of strings.
dynamic_axes	Input	Dynamic axes of the model inputs and outputs. For example, if an input has format NCHW, where N, H and W are dynamic, and an output has format NL, where N is dynamic, then: {"inputs": [0,2,3], "outputs": [0]}. Default: None A dict<string, dict<python:int, string>>, or dict<string, list(int)>.
strategy	Input	Policy for searching for the quantization configuration that meets the accuracy requirements. The dichotomy policy is used by default. A string or a Python instance. Default: BinarySearch
sensitivity	Input	Metric used to evaluate how quantization-sensible each layer to be quantized is. By default, the cosine similarity metric is used. A string or a Python instance. Default: CosineSimilarity

Returns

None

Example

import amct_pytorch as amct
from amct_pytorch.common.auto_calibration import AutoCalibrationEvaluatorBase

# You need to implement the AutoCalibrationEvaluator's calibration(), evaluate() and metric_eval() funcs
class AutoCalibrationEvaluator(AutoCalibrationEvaluatorBase):
    """ subclass of AutoCalibrationEvaluatorBase"""
    def __init__(self, target_loss, batch_num):
        super(AutoCalibrationEvaluator, self).__init__()
        self.target_loss = target_loss
        self.batch_num = batch_num

    def calibration(self, model):
        """ implement the calibration function of AutoCalibrationEvaluatorBase
            calibration() need to finish the calibration inference procedure
            so the inference batch num need to >= the batch_num pass to create_quant_config
        """
        model_forward(model=model, batch_size=32, iterations=self.batch_num)

    def evaluate(self, model):
        """ implement the evaluate function of AutoCalibrationEvaluatorBase
            params: model in torch.nn.module 
            return: the accuracy of input model on the eval dataset, or other metric which
                    can describe the 'accuracy' of model
        """
        top1, _ = model_forward(model=model, batch_size=32, iterations=5)
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        return top1

    def metric_eval(self, original_metric, new_metric):
        """ implement the metric_eval function of AutoCalibrationEvaluatorBase
            params: original_metric: the returned accuracy of evaluate() on non quantized model
                    new_metric: the returned accuracy of evaluate() on fake quant model
            return:
                   [0]: whether the accuracy loss between non quantized model and fake quant model
                        can satisfy the requirement
                   [1]: the accuracy loss between non quantized model and fake quant model
        """
        loss = original_metric - new_metric
        if loss * 100 < self.target_loss:
            return True, loss
        return False, loss
    ...
    # 1. step1 create quant config json file
    config_json_file = os.path.join(TMP, 'config.json')
    skip_layers = []
    batch_num = 2
    amct.create_quant_config(
        config_json_file,
        model,
        input_data,
        skip_layers,
        batch_num
    )

    # 2. step2 construct the instance of AutoCalibrationEvaluator
    evaluator = AutoCalibrationEvaluator(target_loss=0.5, batch_num=batch_num)

    # 3. step3 using accuracy_based_auto_calibration to quantize the model
    record_file = os.path.join(TMP, 'scale_offset_record.txt')
    result_path = os.path.join(PATH, 'result/mobilenet_v2')
    amct.accuracy_based_auto_calibration(
        model=model,
        model_evaluator=evaluator,
        config_file=config_json_file,
        record_file=record_file,
        save_dir=result_path,
        input_data=input_data,
        input_names=['input'],
        output_names=['output'],
        dynamic_axes={
            'input': {0: 'batch_size'},
            'output': {0: 'batch_size'}
        },
        strategy='BinarySearch',
        sensitivity='CosineSimilarity'
    )

Flush files:

A fake-quantized ONNX model file for accuracy simulation on ONNX Runtime with the file name containing the fake_quant keyword.
A deployable ONNX model file with the file name containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
A quantization factor record file (record_file) where quantization factors are written.
A sensitivity file that records how quantization-sensible is each layer, based on which the layers to be unquantized are determined.
An automatic unquantization history file that records the layers to be unquantized.

Parent topic: PTQ APIs