auto_nuq

Function Usage

Performs automatic NUQ on a model based on the input configuration file, searches for an NUQ configuration that meets the accuracy requirement, and outputs a fake-quantized model for accuracy simulation in the Caffe environment and a deployable model on Ascend AI Processor for online inference.

Prototype

auto_nuq(model_file, weights_file, nuq_evaluator, config_file, scale_offset_record_file, save_dir)

Command-Line Options

Option

Input/Return

Description

Restriction

model_file

Input

Definition file of the Caffe model (.prototxt).

A string

weights_file

Input

Weight file of the Caffe model (.caffemodel).

A string

nuq_evaluator

Input

Python instance for automatic non-uniform quantization evaluation.

Data type: Python instance

config_file

Input

Quantization configuration file generated by the user.

A string

scale_offset_record_file

Input

File for storing quantization factors. If the file exists, it will be overwritten.

A string

save_dir

Input

Model save path.

Must include the prefix of the model name, for example, ./quantized_model/*model.

A string

Returns

None

Outputs

  • A fake-quantized model for accuracy simulation in the Caffe environment and its weight file, with names containing the fake_quant keyword.
  • A deployable model and its weight file, with names containing the deploy keyword. The model can be deployed on Ascend AI Processor after being converted by the ATC tool.
  • A quantization factor record file (record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized.
  • Non-uniform quantization information record file: This file records the layers on which non-uniform quantization is performed.
  • A quantization information file that records the locations of the quantization layers inserted by AMCT and operator fusion information, used for accuracy analysis of the quantized model.

When quantization is performed again, the preceding files output by the API will be overwritten.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import amct_caffe as amct    
from amct_caffe.auto_nuq import AutoNuqEvaluatorBase

class AutoNuqEvaluator(AutoNuqEvaluatorBase):
    def __init__(self, evaluate_batch_num):
        self.evaluate_batch_num = evaluate_batch_num
    def eval_model(self, model_file, weights_file, batch_num):
        return do_benchmark_test(args, model_file, weights_file, batch_num)
    def is_satisfied(self, original_metric, new_metric):
        # the loss of top1 acc need to be less than 1%
        if (original_metric - new_metric) *100<1:
            return True
        return False

evaluator = AutoNuqEvaluator(1000)
amct.auto_nuq(
        model_file,
        weights_file,
        evaluator,
        config_json_file,
        scale_offset_record_file,
        './results/Resnet50')