--compression_optimize_conf

Description

Sets the directory (including the file name) of the compression configuration file. This option is used to enable the compression optimization feature specified in the configuration file to improve network performance.

See Also

If this option is used to configure the calibration quantization feature, the high-precision feature cannot be used. For example, force_fp32 or must_keep_origin_dtype (source image fp32 input) cannot be configured by --precision_mode and high_precision cannot be configured by --op_precision_mode. Setting quantization parameters in high-precision mode does not provide any performance benefits of quantization nor that of the high precision mode.

Arguments

Argument: directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions: The current configuration file supports only the calibration feature, which is used to enable the graph compression function. In the configuration file, the information before the colon (:) indicates the model compression feature, and the information after the colon (:) indicates the parameter of the feature. For details, see Examples.

Default: none.

Suggestions and Benefits

None

Examples

Assume that the configuration file of the model compression function is compression_optimize.cfg. The configuration example is as follows:

calibration: 
{
    input_data_dir: ./data.bin,d2.bin
    input_shape: in:16,16;in1:16,16
    config_file: simple_config.cfg
    infer_soc: xxxxxx
    infer_device_id: 0 
    infer_ip: x.x.x.x
    infer_port: 8000
    log: info
}
  • calibration: PTQ refers to quantizing the weights of an already trained model from float32 to int8 as well as quantizing the activations by using a small calibration dataset, accelerating model inference. PTQ is easy to use and requires only a small calibration dataset, which is applicable to scenarios where ease of use and resource saving take priority. To obtain the PTQ sample, click here.
    The options are described as follows.
    • input_data_dir (required): directory of the .bin file of the model input calibration data. If the model has multiple .bin data files, use commas (,) to separate them. The calibration dataset is used to calculate quantization parameters. The calibration dataset must be representative. Subsets of the test dataset are recommended as the calibration dataset. For details about how to generate the binary file of the calibration data, click here.
    • input_shape (required): shape information of the model input calibration data, for example, input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2. Use semicolons (;) to separate the nodes.
    • config_file (optional): simplified configuration file for post-training quantization. For details about the configuration example and parameter description of this file, see Simplified Configuration File.
    • infer_soc (required): name of the SoC used for quantization calibration and inference. To query the name, perform the following steps:
      • Run the npu-smi info command on the server where Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy.
    • infer_device_id (optional): ID of the device powered by the Ascend AI Processor, which is used for post-training quantization calibration and inference. The default value is 0.
    • infer_ip: IP address of the server where the NCS software package is located. This parameter is used in the Ascend RC scenario. If the --ip option is carried during AOE tuning, infer_ip cannot be configured. Otherwise, the system reports an error.
    • infer_port: port number of the server where the NCS software package is located. This parameter is used in the Ascend RC scenario. If the --port option is carried during AOE tuning, infer_port cannot be configured. Otherwise, the system reports an error.
    • log (optional): level of logs printed during post-training quantization. The default log level is info.
      • debug: outputs debug, info, warning, error, and event logs.
      • info: outputs info, warning, error, and event logs.
      • warning: outputs warning, error, and event logs.
      • error: outputs error and event logs.
      In addition, the AMCT_LOG_DUMP environment variable controls the log printing and flushing information during the post-training quantization process.
      • export AMCT_LOG_DUMP=1: prints logs on the screen and does not save the quantization factor record file and graph file.
      • export AMCT_LOG_DUMP=2: flushes logs to the amct_log_$timestamp/amct_acl.log file in the current path and saves the quantization factor record file.
      • export AMCT_LOG_DUMP=3: flushes logs to the amct_log_$timestamp/amct_acl.log file in the current path and saves the quantization factor record file and graph file.

      To prevent the disk from being full due to continuous flushing of log files, record files, and graph files, delete these files in a timely manner.

      If the ASCEND_WORK_PATH environment variable is configured, the preceding logs, quantization factor record files, and graph files are stored in the path specified by this environment variable. For example, if ASCEND_WORK_PATH is set to /home/test, the storage path is /home/test/amct_acl/amct_log_{pid}_$timestamp. amct_acl is automatically created during model conversion, and {pid} indicates the process ID.

Upload the files to any directory (for example, ${HOME}/module) on the server where AOE is located.

--compression_optimize_conf=${HOME}/module/compression_optimize.cfg

Dependencies and Restrictions

  • When the calibration feature in the configuration file is used, only installation scenarios with NPU-equipped devices are supported. For details, see the to set up the product environment.
  • When the calibration feature in the configuration file is used, the AOE tool calls the AMCT quantization APIs to perform related operations. The following figure shows the schematic diagram.
    Figure 1 Post-training quantization principle