create_quant_config

Function Usage

Finds all quantizable layers in a graph, creates a quantization configuration file, and writes the quantization configuration of the quantizable layers to the configuration file.

Prototype

create_quant_config(config_file, model_file, skip_layers=None, batch_num=1, activation_offset=True, config_defination=None, updated_model=None)

Parameters

Option

Input/Return

Description

Restriction

config_file

Input

Path and name of the quantization configuration file

The existing file (if any) in the path will be overwritten upon this API call.

A string

model_file

Input

ONNX model file to be quantized.

The model must be generated based on ONNX opset v11 and is inference-capable on ONNX Runtime 1.5.2.

A string

skip_layers

Input

Layers to skip quantizing.

Default: None

A list of strings.

Restriction: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

batch_num

Input

Number of batches used for quantization, that is, the number of batches used to generate quantization factors.

Type: int

Valid Value: an integer greater than or equal to 0

Default value: 1

Restrictions:

  • batch_num must not be too large. The product of batch_num and batch_size equals the number of images used during quantization. Too many images consume too much memory.
  • If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

activation_offset

Input

Whether to quantize activations with offset.

Default: True

A bool.

Restriction: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

config_defination

Input

Simplified quantization configuration file quant.cfg, generated based on the calibration_config_onnx.proto file in /amct_onnx/proto/calibration_config_onnx.proto under the AMCT installation directory.

For details about the parameters in the calibration_config_onnx.proto file and the generated simplified quantization configuration file quant.cfg, see Simplified PTQ Configuration File.

Default: None

A string

Restriction: If it is set to None, a configuration file is generated based on the residual arguments (skip_layers, batch_num, and activation_offset). Otherwise, a configuration file in JSON format is generated based on this argument.

updated_model

Input

If this parameter is set, the node name in the model is updated. If there is no name, a unique name is generated in the {op_type}_{index} format. For nodes with duplicate names, a number suffix is added to ensure that the name is unique, save the updated ONNX model based on the configured file path.

Default: None

A string

Returns

None

Outputs

A quantization configuration file in JSON format. (When quantization is performed again, this API will overwrite the existing configuration file in the output directory.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
{
    "version":1,
    "batch_num":2,
    "activation_offset":true,
    "joint_quant":false,
    "do_fusion":true,
    "skip_fusion_layers":[],
    "tensor_quantize":[
      {
         "layer_name": "maxpool_ld_default",
         "input_index":0,
         "activation_quant_params":{
             "num_bits":8,
             "max_percentile":0.999999,
             "min_percentile":0.999999,
             "search_range":[
                 0.7,
                1.3
             ],
             "search_step":0.01,
             "act_algo":"ifmr",
             "asymmetric":false
          }
       }
    ],
    "layer_name1":{
        "quant_enable":true,
        "dmq_balancer_param":0.5,
        "activation_quant_params":{
            "num_bits":8,
            "max_percentile":0.999999,
            "min_percentile":0.999999,
            "search_range":[
                0.7,
                1.3
            ],
            "search_step":0.01,
            "act_algo":"ifmr",
            "asymmetric":false
        },
        "weight_quant_params":{
            "num_bits":8,
            "wts_algo":"arq_quantize",
            "channel_wise":true
        }
    },
    "layer_name2":{
        "quant_enable":true,
        "dmq_balancer_param":0.5,
        "activation_quant_params":{
            "num_bits":8,
            "max_percentile":0.999999,
            "min_percentile":0.999999,
            "search_range":[
                0.7,
                1.3
            ],
            "search_step":0.01,
            "act_algo":"ifmr",
            "asymmetric":false
        },
        "weight_quant_params":{
            "num_bits":8,
            "wts_algo":"arq_quantize",
            "channel_wise":false
        }
    }
}

Examples

1
2
3
4
5
6
7
8
9
import amct_onnx as amct

model_file = "resnet101.onnx"
# Create a quantization configuration file.
amct.create_quant_config(config_file="./configs/config.json",
                         model_file=model_file,
                         skip_layers=None,
                         batch_num=1,
                         activation_offset=True)