Manual Tuning

If the accuracy after quantization does not meet the requirements, you can manually adjust the parameters in the config.json file. This section provides the adjustment principles and parameter description.

Workflow

If you find that the accuracy of the model quantized based on the initial config.json file generated by the create_quant_config_ascend API call is not as expected, you can tune the configuration parameters until the accuracy meets your requirement.

  1. Run quantization based on the initial config.json file generated by the create_quant_config_ascend API. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to the next step.
  2. Tweak the value of quant_enable to skip quantizing certain layers.

    quant_enable is the quantization switch of a specified layer. The value false indicates that the layer will be skipped during quantization; true, otherwise. Removing the layer configuration can also skip the layer.

    Quantizing a model can have a negative effect on accuracy. Layers sensitive to quantization will suffer from remarkable error increases once quantized and therefore should be left dequantized. Spot these layers as follows:

    1. In a model, the input layer, output layer, and layers with especially fewer parameters are likely to be quantization-sensitive.
    2. Use the Model Accuracy Analyzer to compare the output errors between the original model and the quantized model layer-wise (a cosine similarity of at least 0.99, for example) to locate the layers that reduce accuracy the most and dequantize them with priority.
  3. Run quantization based on the new configuration generated in 2. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, go to 4.
  4. Tweak the values of activation_quant_params and weight_quant_params to tune the quantization algorithms and parameters.

    For details, see IFMR Algorithm and ARQ Algorithm.

  5. Run quantization based on the new configuration generated in 4. If the accuracy of the quantized model is satisfactory, stop tuning the configuration parameters. Otherwise, it indicates that your model is not suitable for quantization and the quantization configuration should be removed.
Figure 1 Configuration tuning workflow

Quantization Configuration File

If you find that the accuracy of the model quantized based on the config_ascend.json quantization configuration file generated by the create_quant_config_ascend API call is not as expected, you can tune the configuration parameters until the accuracy meets your requirement. Example provides an example of the file content. The following tables describe the parameters in the configuration file.

Table 1 version

Description

Version number of the quantization configuration file

Type

Integer

Value

1

Command-Line Options

Currently, only version 1 is available.

Recommended Value

1

Required/Optional

Optional

Table 2 activation_offset

Description

Symmetric quantization or asymmetric quantization select for activation quantization. It is a global configuration parameter.

The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.

Type

Boolean

Value

true or false

Command-Line Options

  • true: asymmetric quantization
  • false: symmetric quantization

Recommended Value

true

Required/Optional

Optional

Table 3 do_fusion

Description

Fusion switch

Type

Boolean

Value

true or false

Command-Line Options

  • true: on
  • false: off

For the fusible layers and fusion patterns, see Fusion Support.

Recommended Value

true

Required/Optional

Optional

Table 4 skip_fusion_layers

Description

Layers to skip BN fusion

Type

String

Value

Must be names of fusible layers.

For the fusible layers and fusion patterns, see Fusion Support.

Command-Line Options

Sets the layers to skip fusion.

Recommended Value

-

Required/Optional

Optional

Table 5 layer_config

Description

Quantization configuration of a network layer

Type

Object

Value

-

Command-Line Options

Includes the following parameters:

  • quant_enable
  • activation_quant_params
  • weight_quant_params

Recommended Value

-

Required/Optional

Optional

Table 6 quant_enable

Description

Quantization enable

Type

Boolean

Value

true or false

Command-Line Options

  • true: on
  • false: off

Recommended Value

true

Required/Optional

Optional

Table 7 activation_quant_params

Description

Activation quantization parameters

Type

Object

Value

-

Command-Line Options

Includes the following parameters:
  • max_percentile
  • min_percentile
  • search_range
  • search_step
  • act_algo
  • asymmetric

Recommended Value

-

Required/Optional

Optional

Table 8 weight_quant_params

Description

Weight quantization parameters

Type

Object

Value

-

Command-Line Options

Includes the following parameters in uniform quantization:

  • wts_algo
  • channel_wise

Recommended Value

-

Required/Optional

Optional

Table 9 act_algo

Description

Activation quantization algorithm

Type

String

Value

ifmr

Command-Line Options

Currently, only the IFMR activation quantization algorithm is supported.

Recommended Value

-

Required/Optional

Optional

Table 10 asymmetric

Description

Symmetric quantization or asymmetric quantization select for activation quantization. It is used to select the layer-wise quantization algorithm.

The asymmetric parameter takes precedence over the activation_offset parameter if both of them exist in the configuration file.

Type

Boolean

Value

true or false

Command-Line Options

  • true: asymmetric quantization
  • false: symmetric quantization

Recommended Value

true

Required/Optional

Optional

Table 11 max_percentile

Description

Upper bound for searching for the largest

Type

Float

Value

(0.5,1]

Command-Line Options

For example, given 100 numeric values in descending order, the upper bound 1.0 indicates that the value indexed 0 (100 – 100 × 1.0) is considered as the largest.

A larger value indicates that the upper bound for clipping-based quantization is closer to the maximum value of the data to be quantized.

Recommended Value

0.999999

Required/Optional

Optional

Table 12 min_percentile

Description

Lower bound for searching for the smallest

Type

Float

Value

(0.5,1]

Command-Line Options

For example, given 100 numeric values in ascending order, the lower bound 1.0 indicates that the value indexed 0 (100 – 100 × 1.0) is considered as the smallest.

A larger value indicates that the lower bound for clipping-based quantization is closer to the minimum value of the data to be quantized.

Recommended Value

0.999999

Required/Optional

Optional

Table 13 search_range

Description

Quantization factor search range: [search_range_start, search_range_end]

Type

A list of two floats

Value

0 < search_range_start < search_range_end

Command-Line Options

Sets the quantization factor search range.

  • search_range_start: search start
  • search_range_end: search end

Recommended Value

[0.7,1.3]

Required/Optional

Optional

Table 14 search_step

Description

Quantization factor search step

Type

Float

Value

(0, (search_range_endsearch_range_start)]

Command-Line Options

Sets the fluctuation step of the upper bound for clipping-based quantization. A smaller value indicates a smaller quantization factor search step.

Recommended Value

0.01

Required/Optional

Optional

Table 15 wts_algo

Description

Weight quantization algorithm

Type

String

Value

arq_quantize

Command-Line Options

arq_quantize: basic weight quantization

Recommended Value

-

Required/Optional

Optional

Table 16 channel_wise

Description

Whether to use different quantization factors for each channel.

Type

Boolean

Value

true or false

Command-Line Options

  • true: Channels are separately quantized using different quantization factors.
  • false: All channels are quantized altogether using the same quantization factors.

Recommended Value

true

Required/Optional

Optional