--compression_optimize_conf

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference products	√
Atlas training products	√

Description

Sets the directory (including the file name) of the compression configuration file. This option is used to enable the compression optimization feature specified in the configuration file to improve network performance.

Argument

Argument: Directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions:

Currently, only the following compression modes are supported, you can choose whichever meets your requirements.

enable_first_layer_quantization: true
calibration: 
{
    input_data_dir: ./data.bin,d2.bin
    input_shape: in:16,16;in1:16,16
    config_file: simple_config.cfg
    infer_soc: xxxxxx
    infer_aicore_num: 10
    infer_device_id: 0 
    infer_ip: x.x.x.x
    infer_port: 8000
    log: info
}

Note that:

enable_first_layer_quantization: Determines whether to optimize the input layer convolution of AIPP (AIPP is fused with the Quant operator prior to the input layer convolution Conv2D of the quantized model). In the configuration file, the parameter before the colon (:) indicates the name of the compression optimization feature, and the parameter after the colon (:) indicates whether to enable this feature. true indicates enabled, and false (default) indicates disabled. Only the Atlas inference products and Atlas 200I/500 A2 inference products support this feature.
When the enable_first_layer_quantization feature is enabled, performance gain is obtained only when the AIPP+Conv2D structure exists in the network structure and --enable_small_channel in the atc command is set to 1. The accuracy of the quantized model is compromised to some extent. Therefore, you can determine whether to enable this feature as required.
calibration: Post-training quantization (PTQ) refers to quantizing the weights of an already trained model from floating point numbers (float32 and float16 are supported currently) to low-bit integers (such as int8) as well as quantizing the activations by using a small calibration dataset, accelerating model inference. PTQ is easy to use and requires only a small calibration dataset, which is applicable to scenarios where ease of use and resource saving take priority. To obtain the PTQ sample, click here.
The options are described as follows.
- input_data_dir (required): Directory of the .bin file of the calibration data input. If the model has multiple inputs, use commas (,) to separate the input .bin data files. The calibration dataset is used to calculate quantization parameters. The calibration dataset must be representative. Subsets of the test dataset are recommended as the calibration dataset. For details about how to generate the binary file of the calibration data, see Link.
- input_shape (required): Shape information of the model input calibration data, for example, input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2. Use semicolons (;) to separate the nodes.
- config_file (optional): Simplified configuration file for post-training quantization. For details about the configuration example and option description of this file, see the simplified configuration file.
- infer_soc (required): Name of the SoC used for PTQ calibration and inference. To query the name, see Argument.
- infer_aicore_num (optional): number of AI Cores used for PTQ calibration and inference. For details about the query method, see --aicore_num.
- infer_device_id (optional): ID of the device powered by the Ascend AI Processor, which is used for post-training quantization calibration and inference. The default value is 0.
- infer_ip: IP address of the server where the NCS software package is located. This parameter is mandatory in the Atlas 200I/500 A2 inference products Ascend RC scenario.
- infer_port: Port number of the server where the NCS software package is located. This parameter is mandatory in the Atlas 200I/500 A2 inference products Ascend RC scenario.
- log (optional): Level of logs printed during post-training quantization. The default log level is info.
  - debug: Outputs debug, info, warning, error, and event logs.
  - info: Outputs info, warning, error, and event logs.
  - warning: Outputs warning, error, and event logs.
  - error: Outputs error and event logs.
  In addition, the AMCT_LOG_DUMP environment variable controls the log printing and flushing information during the post-training quantization process.
  - export AMCT_LOG_DUMP=1: Prints logs to the screen and flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and does not save the quantization factor record file and graph file.
  - export AMCT_LOG_DUMP=2: Flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and saves the quantization factor record file.
  - export AMCT_LOG_DUMP=3: Flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and saves the quantization factor record file and graph file.
  To prevent the disk from being full due to continuous flushing of log files, record files, and graph files, delete these files in a timely manner.
  
  If the ASCEND_WORK_PATH environment variable is configured, the preceding logs, quantization factor record files, and graph files are stored in the path specified by this environment variable. For example, if ASCEND_WORK_PATH is set to /home/test, the storage path is /home/test/amct_acl/amct_log_{pid}_timestamp. amct_acl is automatically created during model conversion, and {pid} indicates the process ID.
  
  The preceding log files, record files, and graph files will be overwritten when quantization is performed again. You need to save them as required. In addition, the size of the generated log file is related to the number of layers of the model to be quantized. Ensure that the server where ATC is installed has sufficient space.
  
  Take the ResNet-101 model as an example. If the log level is set to INFO, the log file size is about 12 KB, and the size of the temporary file is about 260 MB. If the log level is set to DEBUG, the log file size is about 390 KB, and the size of the temporary file is about 430 MB.

Default: None

Suggestions and Benefits

None

Example

Assume that the configuration file of the compression function is compression_optimize.cfg. A configuration example is as follows:

enable_first_layer_quantization: true
calibration: 
{
    input_data_dir: ./data.bin,d2.bin
    input_shape: in:16,16;in1:16,16
    config_file: simple_config.cfg
    infer_soc: xxxxxx
    infer_aicore_num: 10
    infer_device_id: 0 
    infer_ip: x.x.x.x
    infer_port: 8000
    log: info
}

Upload the file to any directory (for example, $HOME/module) on the server where ATC is located.

--compression_optimize_conf=$HOME/module/compression_optimize.cfg

If "build_main build graph[infer_graph_info] failed" is displayed during model conversion when the quantization function is enabled, you can handle the fault by referring to What Should I Do If "build_main build graph[infer_graph_info] failed" Is Displayed During Model Conversion When the Quantization Function Is Enabled.

Dependencies and Restrictions

When the enable_first_layer_quantization feature is used, ensure that the model in use is the deployable model generated after quantization performed by AMCT.
When the calibration feature in the configuration file is used, only installation scenarios with NPU-equipped devices are supported. For details, refer to CANN Software Installation Guide to set up the product environment.
In the Atlas 200I/500 A2 inference products Ascend RC scenario, you also need to install the NCS software in the operating environment and configure the key certificate. For details, see "AOE Tool (Ascend RC) >Environment Setup" in the AOE Instructions.
When the calibration feature in the configuration file is used, the ATC tool calls the AMCT quantization APIs to perform related operations. The following figure shows the schematic diagram.
Figure 1 Post-training quantization principle

Parent topic: Model Tuning Options