quantize_model

Applicability

Product	Supported
Atlas A3 training series products/Atlas A3 inference series products	x
Atlas A2 training products/Atlas A2 inference products	x
Atlas 200I/500 A2 inference product	√
Atlas inference series products	√
Atlas training products	√

Description

Performs quantization on a graph based on the quantization configuration file config_file, inserts weight and activation quantization layers, and saves the modified network to a new model file.

Prototype

quantize_model(graph, modified_model_file, modified_weights_file)

Parameters

Parameter	Input/Output	Restriction
graph	Input	Graph structure parsed by the init API from the user model An AMCT-defined Graph.
modified_model_file	Input	Name of the resultant Caffe model definition file (.prototxt) for storing the inserted quantization layers. A string.
modified_weights_file	Input	Name of the resultant Caffe model weight file (.caffemodel) for storing the inserted quantization layers. A string.

Parameter

Input/Output

Restriction

graph

Input

Graph structure parsed by the init API from the user model

An AMCT-defined Graph.

modified_model_file

Input

Name of the resultant Caffe model definition file (.prototxt) for storing the inserted quantization layers.

A string.

modified_weights_file

Input

Name of the resultant Caffe model weight file (.caffemodel) for storing the inserted quantization layers.

A string.

Returns

None

Example

from amct_caffe import quantize_model
# Insert the quantization API.
quantize_model(graph=graph,
               modified_model_file="./quantized_model/modified_model.prototxt",
               modified_weights_file="./quantized_model/modified_model.caffemodel")

Flush files:

A quantization factor record file (scale_offset_record_file), which records the weight quantization factors (scale_w and offset_w) of each layer to be quantized. See init.
modified_model_file: definition file of the modified model, with quantization layers inserted into the original model.

modified_weights_file: weight file of the modified model, with quantization layers inserted into the original model.

When quantization is performed again, the preceding files output by the API will be overwritten.

Parent topic: PTQ APIs