create_quant_config

Applicability

Product

Supported

Atlas A3 training series products/Atlas A3 inference series products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference product

Atlas inference series products

Atlas training products

Description

Finds all quantizable layers in a graph, creates a quantization configuration file, and writes the quantization configuration of the quantizable layers to the configuration file.

Prototype

1
create_quant_config(config_file, graph, skip_layers=None, batch_num=1, activation_offset=True, config_defination=None)

Parameters

Parameter

Input/Output

Description

config_file

Input

Path (including the file name) of the quantization configuration file.

The existing file (if any) in the path will be overwritten upon this API call.

A string.

graph

Input

tf.Graph of the model for quantization.

A tf.Graph.

skip_layers

Input

Name of the layer that does not need to be quantized in tf.Graph.

Default: None

A list of strings, for example, ['op1','op2','op3']

Restrictions: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

batch_num

Input

Number of batches taken to generate the quantization factors.

An int.

Value range: any integer larger than 0.

Default: 1

Restrictions:

  • batch_num must not be too large. The product of batch_num and batch_size equals the number of images used during quantization. Too many images consume too much memory.
  • If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

activation_offset

Input

Whether to quantize activations with offset.

Default: True

A bool

Restrictions: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect.

config_defination

Input

Simplified PTQ configuration file.

The simplified quantization configuration file quant.cfg is generated based on the calibration_config_tf.proto file. The *.proto file is stored in /amct_tensorflow/proto/ under the AMCT installation directory. For details about the parameters in the *.proto file and the generated simplified quantization configuration file quant.cfg, see Simplified PTQ Configuration File.

Default: None

A string.

Restrictions: If it is set to None, a configuration file is generated based on the remaining arguments (skip_layers, batch_num, and activation_offset). In other cases, a configuration file in JSON format is generated based on this argument.

Returns

None

Example

1
2
3
4
5
6
7
8
9
import amct_tensorflow as amct
# Build a graph of the network to be quantized.
network = build_network()
# Create a quantization configuration file.
amct.create_quant_config(config_file="./configs/config.json",
                    graph=tf.get_default_graph(),
                    skip_layers=None,
                    batch_num=1,
                    activation_offset=True)

The following is an example of the generated quantization configuration file in JSON format. (The quantization configuration file output by this API will be overwritten when quantization is performed again.) For details about the parameters, see Quantization Configuration File.

  • Uniform quantization configuration file (see IFMR Algorithm for activation quantization)
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    {
        "version":1,
        "batch_num":1,
        "activation_offset":true,
        "joint_quant":false,
        "do_fusion":true,
        "skip_fusion_layers":[],
        "tensor_quantize":[
            {
                "layer_name": "MaxPool",
                "input_index": 0,
                "activation_quant_params":{
                    "num_bits":8,
    	        "act_algo":"hfmg",
    	        "num_of_bins":4096,
                    "asymmetric":false
                 }
    	}
        ]
        "MobilenetV2/Conv/Conv2D":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                   
                "max_percentile":0.999999,
                "min_percentile":0.999999,
                "search_range":[
                    0.7,
                    1.3
                ],
                "search_step":0.01,
                "act_algo":"ifmr",
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":true
            }
        },
        "MobilenetV2/Logits/AvgPool":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                "max_percentile":0.999999,
                "min_percentile":0.999999,
                "search_range":[
                    0.7,
                    1.3
                ],
                "search_step":0.01,
                "act_algo":"ifmr",
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":false
            }
        }
    }
    
  • Uniform quantization configuration file (see HFMG Algorithm for activation quantization)
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    {
        "version":1,
        "batch_num":2,
        "activation_offset":true,
        "joint_quant":false,
        "do_fusion":true,
        "skip_fusion_layers":[],
        "MobilenetV2/Conv_1/Conv2D":{
            "quant_enable":true,
            "dmq_balancer_param":0.5,
            "activation_quant_params":{
                "num_bits":8,
                "act_algo":"hfmg",
                "num_of_bins":4096
                "asymmetric":false
            },
            "weight_quant_params":{
                "num_bits":8,
                "wts_algo":"arq_quantize",
                "channel_wise":true
            }
        }
    }