auto_nuq

Applicability

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	x
Atlas A2 training products/Atlas A2 inference products	x
Atlas 200I/500 A2 inference product	x
Atlas inference series products	x
Atlas training products	x

Note: For the Products marked with x, no error is reported when the API is called, but the performance benefits cannot be obtained.

Description

Performs auto NUQ on a model based on the input configuration file, searches for an NUQ configuration that meets the accuracy requirement, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on the Ascend AI Processor for online inference.

Prototype

auto_nuq(model_file,  weights_file, nuq_evaluator, config_file, scale_offset_record_file, save_dir)

Parameters

Parameter	Input/Output	Description
model_file	Input	Definition file (.prototxt) of the Caffe model. A string.
weights_file	Input	Weight file (.caffemodel) of the trained Caffe model. A string.
nuq_evaluator	Input	Python instance for auto NUQ evaluation. A Python instance.
config_file	Input	Quantization configuration file generated by the user. A string.
scale_offset_record_file	Input	File for storing quantization factors. The existing file (if any) in the path will be overwritten. A string.
save_dir	Input	Model save path. Must include the prefix of the model name, for example, **./quantized_model/model***. A string.

Returns

None

Example

import amct_caffe as amct    
from amct_caffe.auto_nuq import AutoNuqEvaluatorBase

class AutoNuqEvaluator(AutoNuqEvaluatorBase):
    def __init__(self, evaluate_batch_num):
        self.evaluate_batch_num = evaluate_batch_num
    def eval_model(self, model_file, weights_file, batch_num):
        return do_benchmark_test(args, model_file, weights_file, batch_num)
    def is_satisfied(self, original_metric, new_metric):
        # the loss of top1 acc need to be less than 1%
        if (original_metric - new_metric) *100<1:
            return True
        return False

evaluator = AutoNuqEvaluator(1000)
amct.auto_nuq(
        model_file,
        weights_file,
        evaluator,
        config_json_file,
        scale_offset_record_file,
        './results/Resnet50')

Flush files:

A fake-quantized model file for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
A deployable model file and its weight file, with names containing the deploy keyword. The model can be deployed on the Ascend AI Processor after being converted by ATC.
A quantization factor record file (scale_offset_record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
An NUQ information file that records the layers that are non-uniformly quantized.
A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.

When quantization is performed again, the preceding files output by the API will be overwritten.

Parent topic: PTQ APIs