Simplified Configuration File for Auto Channel Pruning Search

Find the basic_info.proto file in /amct_tensorflow/proto/basic_info.proto under the AMCT installation directory. The file content is as follows:

Message

Required

Type

Parameter

Description

AutoMixedPrecisionConfig

-

-

-

AMCT simplified configuration for automatic mixed precision search. The current version does not support this feature.

optional

float

compress_ratio

Compression ratio. The computation amount of all quantizable layers is used as a reference compression multiple.

repeated

QuantBitLimit

quant_bit_limit

Quantization bit width search range of some layers.

optional

string

ptq_cfg

Simplified PTQ configuration file, which is used to obtain the quantization factors under the quantization bit widths of INT4 and INT8 during calibration.

If this parameter is not set, the default PTQ configuration is used. Currently, only INT8 quantization is supported.

optional

int64

test_iteration

Number of batches of the dump data. The data is used to measure the quantization impact and computation amount. The data volume should be representative.

optional

string

override_qat_cfg

Simplified configuration file for QAT. The output of the automatic mixed precision search overwrites the bit width of the layer, and other parameters remain unchanged.

If this parameter is not set, the simplified quantization aware training configuration file (in .proto format) is used to generate a .cfg configuration file with quantization bit width information.

AutoChannelPruneConfig

-

-

-

AMCT simplified configuration for auto channel pruning search

required

float

compress_ratio

Compression ratio. The computation amount of all quantizable layers is used as a reference compression multiple.

optional

bool

ascend_optimized

Whether to perform adaptation to Ascend platforms. If the pruned model is to be deployed on Ascend AI Processor, you are advised to set this parameter to true.

optional

float

max_prune_ratio

Maximum sparsity rate of a single layer, which is the maximum sparsity rate in the sparsity configuration output by the API. The default value is 1.

optional

int64

test_iteration

Batch number of the input test data.

optional

string

override_prune_cfg

Simplified configuration file for sparsity of a specified channel. Only the skip and override configurations can be included. The configured layer uses the specified configuration and will not be overridden by the automatic channel sparsity search API.

QuantBitLimit

-

-

-

Quantization bit width search range of some layers.

optional

string

layer_name

Layer name.

repeated

DataType

data_range

Quantization bit width range.

DataType

-

-

-

Quantization bit width range. Enumeration Types Currently, only INT8 quantization is supported.

-

-

FLOAT

Floating point, not quantized.

-

-

INT8

INT8 quantization

-

-

INT4

INT4 quantization

The following is an example of the simplified configuration file (amc.cfg) for auto channel pruning search:

compress_ratio: 1.5
ascend_optimized: true
max_prune_ratio: 0.8
test_iteration: 1
override_prune_cfg: 'your/path/to/override_channel_prune.cfg'