--compression_optimize_conf

Description

Sets the directory (including the file name) of the compression configuration file. This option is used to enable the compression optimization feature specified in the configuration file to improve network performance.

See Also

If this option is used to configure the calibration quantization feature, the high-precision feature cannot be used. For example, force_fp32 or must_keep_origin_dtype (source image fp32 input) cannot be configured by --precision_mode, origin cannot be configured by --precision_mode_v2, and high_precision cannot be configured by --op_precision_mode. When quantization parameters are set in high-precision mode, neither the performance benefits of quantization nor the precision benefits of the high-precision mode can be obtained.

Argument

Argument: Directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions:

Currently, only the following compression modes are supported, you can choose whichever meets your requirements.

calibration: 
{
    input_data_dir: ./data.bin,d2.bin
    input_shape: in:16,16;in1:16,16
    config_file: simple_config.cfg
    infer_soc: xxxxxx
    infer_aicore_num: 10
    infer_device_id: 0 
    log: info
}

Note that:

  • calibration: PTQ refers to quantizing the weights of an already trained model from float32 to int8 as well as quantizing the activations by using a small calibration dataset, accelerating model inference. PTQ is easy to use and requires only a small calibration dataset, which is applicable to scenarios where ease of use and resource saving take priority. To obtain the PTQ sample, click here.
    The options are described as follows.
    • input_data_dir (required): Directory of the .bin file of the calibration data input. If the model has multiple inputs, use commas (,) to separate the input .bin data files. The calibration dataset is used to calculate quantization parameters. The calibration dataset must be representative. Subsets of the test dataset are recommended as the calibration dataset. For details about how to generate the binary file of the calibration data, see Link.
    • input_shape (required): Shape information of the model input calibration data, for example, input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2. Use semicolons (;) to separate the nodes.
    • config_file (optional): Simplified configuration file for post-training quantization. For details about the configuration example and option description of this file, see the simplified configuration file.
    • infer_soc (mandatory): Name of the SoC used for post-training quantization calibration and inference. For details about how to query, see Argument.
    • infer_aicore_num (optional): number of AI Cores used for quantization calibration and inference after training. You are advised not to use this option. If you need to use this option, retain the default. For details, see Argument.
    • infer_device_id (optional): ID of the device powered by the Ascend AI Processor, which is used for post-training quantization calibration and inference. The default value is 0.
    • log (optional): Level of logs printed during post-training quantization. The default log level is info.
      • debug: Outputs debug, info, warning, error, and event logs.
      • info: Outputs info, warning, error, and event logs.
      • warning: Outputs warning, error, and event logs.
      • error: Outputs error and event logs.
      In addition, the AMCT_LOG_DUMP environment variable controls the log printing and flushing information during the post-training quantization process.
      • export AMCT_LOG_DUMP=1: Prints logs to the screen and flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and does not save the quantization factor record file and graph file.
      • export AMCT_LOG_DUMP=2: Flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and saves the quantization factor record file.
      • export AMCT_LOG_DUMP=3: Flushes logs to the amct_log_timestamp/amct_acl.log file in the current path and saves the quantization factor record file and graph file.

      To prevent the disk from being full due to continuous flushing of log files, record files, and graph files, delete these files in a timely manner.

      If the ASCEND_WORK_PATH environment variable is configured, the preceding logs, quantization factor record files, and graph files are stored in the path specified by this environment variable. For example, if ASCEND_WORK_PATH is set to /home/test, the storage path is /home/test/amct_acl/amct_log_{pid}_timestamp. amct_acl is automatically created during model conversion, and {pid} indicates the process ID.

      The preceding log files, record files, and graph files will be overwritten when quantization is performed again. You need to save them as required. In addition, the size of the generated log file is related to the number of layers of the model to be quantized. Ensure that the server where ATC is installed has sufficient space.

      Take the ResNet-101 model as an example. If the log level is set to INFO, the log file size is about 12 KB, and the size of the temporary file is about 260 MB. If the log level is set to DEBUG, the log file size is about 390 KB, and the size of the temporary file is about 430 MB.

Default: None

Suggestions and Benefits

None

Example

Assume that the configuration file of the compression function is compression_optimize.cfg. The configuration example is as follows:

calibration: 
{
    input_data_dir: ./data.bin,d2.bin
    input_shape: in:16,16;in1:16,16
    config_file: simple_config.cfg
    infer_soc: xxxxxx
    infer_aicore_num: 10
    infer_device_id: 0 
    log: info
}

Upload the file to any directory (for example, $HOME/module) on the server where ATC is located.

--compression_optimize_conf=$HOME/module/compression_optimize.cfg

If "build_main build graph[infer_graph_info] failed" is displayed during model conversion when the quantization function is enabled, you can handle the fault by referring to What Should I Do If "build_main build graph[infer_graph_info] failed" Is Displayed During Model Conversion When the Quantization Function Is Enabled.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Dependencies and Restrictions

  • When the enable_first_layer_quantization feature is used, ensure that the model in use is the deployable model generated after quantization performed by AMCT.
  • When the calibration feature in the configuration file is used, only installation scenarios with NPU-equipped devices are supported. For details, refer to CANN Software Installation Guide to set up the product environment.
  • When the calibration feature in the configuration file is used, the ATC tool calls the AMCT quantization APIs to perform related operations. The following figure shows the schematic diagram.
    Figure 1 Post-training quantization principle