Simplified Configuration File for Auto Channel Pruning Search
Find the basic_info.proto file in /amct_tensorflow/proto/basic_info.proto under the AMCT installation directory. The file content is as follows:
Message |
Required |
Type |
Parameter |
Description |
|---|---|---|---|---|
AutoMixedPrecisionConfig |
- |
- |
- |
AMCT simplified configuration for automatic mixed precision search. The current version does not support this feature. |
optional |
float |
compress_ratio |
Compression ratio. The computation amount of all quantizable layers is used as a reference compression multiple. |
|
repeated |
QuantBitLimit |
quant_bit_limit |
Quantization bit width search range of some layers. |
|
optional |
string |
ptq_cfg |
Simplified PTQ configuration file, which is used to obtain the quantization factors under the quantization bit widths of INT4 and INT8 during calibration. If this parameter is not set, the default PTQ configuration is used. Currently, only INT8 quantization is supported. |
|
optional |
int64 |
test_iteration |
Number of batches of the dump data. The data is used to measure the quantization impact and computation amount. The data volume should be representative. |
|
optional |
string |
override_qat_cfg |
Simplified configuration file for QAT. The output of the automatic mixed precision search overwrites the bit width of the layer, and other parameters remain unchanged. If this parameter is not set, the simplified quantization aware training configuration file (in .proto format) is used to generate a .cfg configuration file with quantization bit width information. |
|
AutoChannelPruneConfig |
- |
- |
- |
AMCT simplified configuration for auto channel pruning search |
required |
float |
compress_ratio |
Compression ratio. The computation amount of all quantizable layers is used as a reference compression multiple. |
|
optional |
bool |
ascend_optimized |
Whether to perform adaptation to Ascend platforms. If the pruned model is to be deployed on Ascend AI Processor, you are advised to set this parameter to true. |
|
optional |
float |
max_prune_ratio |
Maximum sparsity rate of a single layer, which is the maximum sparsity rate in the sparsity configuration output by the API. The default value is 1. |
|
optional |
int64 |
test_iteration |
Batch number of the input test data. |
|
optional |
string |
override_prune_cfg |
Simplified configuration file for sparsity of a specified channel. Only the skip and override configurations can be included. The configured layer uses the specified configuration and will not be overridden by the automatic channel sparsity search API. |
|
QuantBitLimit |
- |
- |
- |
Quantization bit width search range of some layers. |
optional |
string |
layer_name |
Layer name. |
|
repeated |
DataType |
data_range |
Quantization bit width range. |
|
DataType |
- |
- |
- |
Quantization bit width range. Enumeration Types Currently, only INT8 quantization is supported. |
- |
- |
FLOAT |
Floating point, not quantized. |
|
- |
- |
INT8 |
INT8 quantization |
|
- |
- |
INT4 |
INT4 quantization |
The following is an example of the simplified configuration file (amc.cfg) for auto channel pruning search:
compress_ratio: 1.5 ascend_optimized: true max_prune_ratio: 0.8 test_iteration: 1 override_prune_cfg: 'your/path/to/override_channel_prune.cfg'