Manual Tuning
If the QAT accuracy does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.
Tuning Workflow
If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_retrain_config API call is not as expected, you can tune the configuration parameters as follows until the accuracy meets your requirement.
- Run quantization based on the initial config.json file generated by the create_quant_retrain_config API call. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to the next step.
- If the result model of INT8 quantization shows accuracy drop, skip quantizing selected layers by setting the retrain_enable field to false for these layers. Generally, the input and output layers are likely to be quantization-sensitive. Skip quantizing the input and output layers first. You can also try to tune clip_max and clip_min in the quantization configuration file as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
{ "version":1, "batch_num":1, "layername1":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "clip_max":3.0, "clip_min":-3.0 }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } }, "layername2":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "clip_max":3.0, "clip_min":-3.0 }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } } }
- Run quantization based on the new configuration. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for QAT and the QAT configuration should be removed.
Quantization Configuration File
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | { "version":1, "batch_num":1, "conv1":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "dst_type":"INT8" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true, "dst_type":"INT8" } }, "layer1.0.conv1":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "dst_type":"INT8" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true, "dst_type":"INT8" } }, "fc":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "dst_type":"INT8" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":false, "dst_type":"INT8" } } ... } |
Command-Line Options
The following describes the configuration parameters available in the configuration file. Note that Table 8 to Table 10 are available only when you manually tune the quantization configuration file.
Description |
Version number of the quantization configuration file |
|---|---|
Type |
int |
Value |
1 |
Command-Line Options |
Currently, only version 1 is available. |
Recommended Value |
1 |
Required/Optional |
This function is optional. |
Description |
Batch number in the inference phase of quantization aware training. |
|---|---|
"Type" |
int |
The options are as follows: |
Greater than 0 |
Command-Line Options |
Defaults to 1. You are advised to keep the calibration dataset size within 50 images. Calculate batch_num based on batch_size as follows: batch_num x batch_size = Calibration dataset size batch_size indicates the number of images per batch. |
Recommended Value |
1 |
Required/Optional |
Optional |
Description |
QAT enable |
|---|---|
Specification |
bool |
Value |
true or false |
Command-Line Options |
|
Recommended |
true |
Required/Optional |
Optional |
Description |
Activation quantization configuration |
|---|---|
Specification |
object |
Value |
- |
Command-line options |
Includes the following parameters:
|
Recommended Value |
- |
Required/Optional |
This function is optional. |
Description |
Weight quantization configuration |
|---|---|
Type |
object |
Value |
- |
Command-Line Options |
Includes the following parameters:
|
Recommended Value |
- |
Required/Optional |
Optional |
Description |
Quantization algorithm |
|---|---|
Type |
object |
Value |
- |
Command-Line Options |
|
Recommended Value |
Set to ulq_quantize for activation quantization or arq_retrain for weight quantization. |
Required/Optional |
This function is optional. |
Description |
Whether to use different quantization factors for each channel |
|---|---|
Type |
bool |
Value |
true or false |
Command-Line Options |
|
Recommended |
true |
Required/Optional |
Optional |
Description |
Fixed lower bound switch for the activation quantization algorithm |
|---|---|
Type |
bool |
Value |
true or false |
Command-line options |
If this parameter is not included, AMCT automatically sets the lower bound of the activation quantization algorithm according to the graph structure. If this parameter is included, set this parameter for each layer to be quantized as follows: true if the upstream layer is ReLU; false otherwise. |
Recommended Value |
Do not include this parameter. |
Required/Optional |
This function is optional. |
Description |
Upper bound for the activation quantization algorithm |
|---|---|
Type |
float |
Value |
clip_max>0 Find the maximum max based on the distribution of the activation values at different layers. The recommended value range is [0.3 x max, 1.7 x max]. |
Command-Line Options |
If this parameter is included, the upper bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the upper bound is learned using the IFMR algorithm. |
Recommended Value |
Do not include this parameter. |
Required/Optional |
This function is optional. |
Description |
Lower bound for the activation quantization algorithm |
|---|---|
Type |
float |
Value |
clip_min<0 Find the minimum min based on the distribution of the activation values at different layers. The recommended value range is [0.3 x min, 1.7 x min]. |
Command-line options |
If this parameter is included, the lower bound of the clipping-based activation quantization algorithm is fixed. If this parameter is not included, the lower bound is learned using the IFMR algorithm. |
Recommended Value |
Do not include this parameter. |
Required/Optional |
This function is optional. |
Description |
Quantization bit width select Currently, only INT8 quantization is supported. |
|---|---|
Type |
string |
Value |
INT8 (default) or INT4 |
Command-Line Options |
Set the quantization bit width, INT8 or INT4. |
Recommended Value |
- |
Required/Optional |
Optional |