create_quant_config
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
√ |
Description
Finds all quantizable layers in a graph, creates a quantization configuration file, and writes the quantization configuration of the quantizable layers to the configuration file.
Prototype
1 | create_quant_config(config_file, model_file, skip_layers=None, batch_num=1, activation_offset=True, config_defination=None, updated_model=None) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
config_file |
Input |
Path (including the file name) of the quantization configuration file. The existing file (if any) in the path will be overwritten upon this API call. A string. |
model_file |
Input |
ONNX model file to be quantized. The model must be generated based on ONNX opset v11 and is inference-capable on ONNX Runtime 1.5.2. A string. |
skip_layers |
Input |
Layers to skip quantization. Default: None A list of strings. Restrictions: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect. |
batch_num |
Input |
Number of batches taken to generate the quantization factors. An int. Value range: any integer larger than 0. Default: 1 Restrictions:
|
activation_offset |
Input |
Whether to quantize activations with offset. Default: True A bool. Restrictions: If a simplified quantization configuration file is used as the input, this parameter must be set in the configuration file. In this case, the parameter setting in the input does not take effect. |
config_defination |
Input |
Simplified quantization configuration file quant.cfg generated based on the calibration_config_onnx.proto file. The *.proto file is stored in /amct_onnx/proto/ under the AMCT installation directory. For details about the parameters in the *.proto file and the generated simplified quantization configuration file quant.cfg, see Simplified PTQ Configuration File. Default: None A string. Restrictions: If it is set to None, a configuration file is generated based on the remaining arguments (skip_layers, batch_num, and activation_offset). In other cases, a configuration file in JSON format is generated based on this argument. |
updated_model |
Input |
Whether to update the node names in the model. If a node does not have a name, a unique name is generated in {op_type}_{index} format. For nodes with duplicate names, digit suffixes are added to ensure that the node names are unique, and the updated ONNX model will be saved in the configured file path. Default: None A string. |
Returns
None
Example
1 2 3 4 5 6 7 8 9 | import amct_onnx as amct model_file = "resnet101.onnx" # Create a quantization configuration file. amct.create_quant_config(config_file="./configs/config.json", model_file=model_file, skip_layers=None, batch_num=1, activation_offset=True) |
Flush file: a quantization configuration file in JSON format. The following is an example. (The quantization configuration file output by this API will be overwritten when quantization is performed again.) For details about the parameters, see Quantization Configuration File.
- Uniform quantization configuration file (see IFMR Algorithm for activation quantization)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
{ "version":1, "batch_num":2, "activation_offset":true, "joint_quant":false, "do_fusion":true, "skip_fusion_layers":[], "tensor_quantize":[ { "layer_name": "maxpool_ld_default", "input_index":0, "activation_quant_params":{ "num_bits":8, "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[ 0.7, 1.3 ], "search_step":0.01, "act_algo":"ifmr", "asymmetric":false } } ], "layer_name1":{ "quant_enable":true, "dmq_balancer_param":0.5, "activation_quant_params":{ "num_bits":8, "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[ 0.7, 1.3 ], "search_step":0.01, "act_algo":"ifmr", "asymmetric":false }, "weight_quant_params":{ "num_bits":8, "wts_algo":"arq_quantize", "channel_wise":true } }, "layer_name2":{ "quant_enable":true, "dmq_balancer_param":0.5, "activation_quant_params":{ "num_bits":8, "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[ 0.7, 1.3 ], "search_step":0.01, "act_algo":"ifmr", "asymmetric":false }, "weight_quant_params":{ "num_bits":8, "wts_algo":"arq_quantize", "channel_wise":false } } }
- Uniform quantization configuration file (see HFMG Algorithm for activation quantization)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
{ "version":1, "batch_num":2, "activation_offset":true, "do_fusion":true, "skip_fusion_layers":[], "tensor_quantize":[ { "layer_name": "maxpool_ld_default", "input_index":0, "activation_quant_params":{ "num_bits":8, "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[ 0.7, 1.3 ], "search_step":0.01 "act_algo":"hfmg" "asymmetric":false } } ], "layer_name1":{ "quant_enable":true, "dmq_balancer_param":0.5, "activation_quant_params":{ "num_bits":8, "act_algo":"hfmg", "num_of_bins":4096 "asymmetric":false }, "weight_quant_params":{ "num_bits":8, "wts_algo":"arq_quantize", "channel_wise":true } } }