auto_nuq

Function Usage

Performs automatic NUQ on a model based on the input configuration file, searches for an NUQ configuration that meets the accuracy requirement, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on Ascend AI Processor for online inference.

Prototype

auto_nuq(model_file, weights_file, nuq_evaluator, config_file, scale_offset_record_file, save_dir)

Command-Line Options

Option	Input/Return	Description	Restriction
model_file	Input	Definition file of the Caffe model (.prototxt).	A string
weights_file	Input	Weight file of the Caffe model (.caffemodel).	A string
nuq_evaluator	Input	Python instance for automatic non-uniform quantization evaluation.	Data type: Python instance
config_file	Input	Quantization configuration file generated by the user.	A string
scale_offset_record_file	Input	File for storing quantization factors. If the file exists, it will be overwritten.	A string
save_dir	Input	Model save path. Must include the prefix of the model name, for example, *./quantized_model/model**.	A string

Returns

None

Outputs

A fake-quantized model for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
A deployable model and its weight file, with names containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
A quantization factor record file (record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
Non-uniform quantization information record file: This file records the layers on which non-uniform quantization is performed.
A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.

When quantization is performed again, the preceding files output by the API will be overwritten.

Examples

import amct_caffe as amct    
from amct_caffe.auto_nuq import AutoNuqEvaluatorBase

class AutoNuqEvaluator(AutoNuqEvaluatorBase):
    def __init__(self, evaluate_batch_num):
        self.evaluate_batch_num = evaluate_batch_num
    def eval_model(self, model_file, weights_file, batch_num):
        return do_benchmark_test(args, model_file, weights_file, batch_num)
    def is_satisfied(self, original_metric, new_metric):
        # the loss of top1 acc need to be less than 1%
        if (original_metric - new_metric) *100<1:
            return True
        return False

evaluator = AutoNuqEvaluator(1000)
amct.auto_nuq(
        model_file,
        weights_file,
        evaluator,
        config_json_file,
        scale_offset_record_file,
        './results/Resnet50')

Parent topic: PTQ APIs