Record Files

A record file is a serialized data structure file based on Protobuf. It records the scale and offset factors for quantization and the cascade correlations between pruned layers. You can generate a compressed model file by using the record file, compression configuration file, and original network model file.

Record Prototype Definition

The Protobuf prototype is defined as follows (find the code in the /amct_pytorch/proto/scale_offset_record_pytorch.proto file under the AMCT installation directory).

syntax = "proto2";

message SingleLayerRecord {
    optional float scale_d = 1;
    optional int32 offset_d = 2;
    repeated float scale_w = 3;
    repeated int32 offset_w = 4;
    repeated uint32 shift_bit = 5;
    repeated float tensor_balance_factor = 6;
    optional bool skip_fusion = 9 [default = true];
    optional string dst_type = 10 [default = 'INT8'];
    optional string act_type = 11 [default = 'INT8'];
    optional string wts_type = 12 [default = 'INT8'];
}

message MapFiledEntry {
    optional string key = 1;
    optional SingleLayerRecord value = 2;
    optional SingleLayerKVCacheRecord kv_cache_value = 3;

}

message ScaleOffsetRecord {
    repeated MapFiledEntry record = 1;
    repeated PruneRecord prune_record = 2;
}

message PruneRecord {
    repeated PruneNode producer = 1;
    repeated PruneNode consumer = 2;
    optional PruneNode selective_prune = 3;
}

message PruneNode {
    required string name = 1;
    repeated AMCTProto.AttrProto attr = 2;
}

The parameters are described as follows.

Message

Required

Type

Parameter

Description

SingleLayerRecord

-

-

-

Quantization factors.

optional

float

scale_d

Scale factor for activation quantization. Only unified activation quantization is supported.

optional

int32

offset_d

Offset factor for activation quantization; only unified activation quantization is supported.

repeated

float

scale_w

Scale factor for weight quantization. Scalar (quantizing the weight of the current layer in a unified manner) and vector (quantizing the weight of the current layer in channel-wise mode) modes are supported. Only the Conv2d type supports the channel-wise quantization mode.

repeated

int32

offset_w

Offset factor for weight quantization. Similar to scale_w, it also supports scalar and vector modes and the dimension configuration must be the same as that of scale_w. Currently, weight quantization with offset is not supported, and offset_w must be 0.

repeated

uint32

shift_bit

Shift factor. This parameter is reserved and is not written to the record file.

optional

bool

skip_fusion

Whether to skip Conv+BN fusion at the current layer. Defaults to false, indicating performing the preceding fusion type.

optional

string

dst_type

Quantization bit width, either INT8 or INT4. This field is used only for QAT.

repeated

float

tensor_balance_factor

Balanced quantization factor. This field is used only in pre-balancing activation quantization.

optional

string

act_type

Activation quantization bit width: INT8 or INT16. Currently, only INT8 quantization is supported.

optional

string

wts_type

Weight quantization bit width.

Currently, the quantization factors after INT6 and INT7 quantization are still saved as the INT8 type.

SingleLayerKVCacheRecord

-

-

-

kv-cache quantization factor configuration.

repeated

float32

scale

Scale quantization factor.

repeated

int32

offset

Offset quantization factor.

ScaleOffsetRecord

-

-

-

Map structure. The discrete map structure is used to ensure compatibility.

repeated

MapFiledEntry

record

Quantization factor record per layer, consisting of two members:

  • key: layer name.
  • value: quantization factors defined by SingleLayerRecord.

repeated

PruneRecord

prune_record

Sparsity records.

MapFiledEntry

optional

string

key

Layer name.

optional

SingleLayerRecord

value

Quantization factor configuration.

optional

SingleLayerKVCacheRecord

kv_cache_value

kv-cache quantization factor configuration.

PruneRecord

-

-

-

Sparsity records.

repeated

PruneNode

producer

Sparsity producer, which is the root node of the cascade correlations between sparsifiable nodes.

For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2.

repeated

PruneNode

consumer

Sparsity consumer, which is the downstream node of the cascade correlations between sparsifiable nodes.

For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2.

optional

PruneNode

selective_prune

2:4 structured sparsity node.

PruneNode

-

-

-

Node to be sparsified.

required

string

name

Node name.

repeated

AMCTProto.AttrProto

attr

Node attributes.

Beware that the Protobuf protocol does not report an error if you have set optional fields more than once. As such, the most recent settings are used.

Record Files

The format of a generated record file is record.txt. According to different features, record files are classified into:

  • Quantization record file

    For common quantization layers, the scale_d, offset_d, scale_w, and offset_w parameters must be included. The following is an example:

    record {
      key: "conv1"
      value {
        scale_d: 0.0798481479
        offset_d: 1
        scale_w: 0.00297622895
        offset_w: 0
        skip_fusion: true
        dst_type: "INT8"
      }
    }
    record {
      key: "layer1.0.conv1"
      value {
        scale_d: 0.00392156886
        offset_d: -128
        scale_w: 0.00106807391
        scale_w: 0.00104224426
        offset_w: 0
        offset_w: 0
        dst_type: "INT8"
      }
    }
  • Activation quantization balance preprocessing record file. The following is an example:
    record {
      key: "linear_1"
      value {
        scale_d: 0.00784554612
        offset_d: -1
        scale_w: 0.00778095098
        offset_w: 0
        tensor_balance_factor: 0.948409557
        tensor_balance_factor: 0.984379828
      }
    }
    record {
      key: "conv_1"
      value {
        scale_d: 0.00759239076
        offset_d: -4
        scale_w: 0.0075149606
        offset_w: 0
        tensor_balance_factor: 1.04744744
        tensor_balance_factor: 1.44586647
      }
    }
  • Filter-level sparsity record file, which records the cascade correlations between pruned layers. The following is an example:
    prune_record {
      producer {
        name: "conv1"
        attr {
          name: "type"
          type: STRING
          s: "Conv2d"
        }
        attr {
          name: "begin"
          type: INT
          i: 0
        }
        attr {
          name: "end"
          type: INT
          i: 64
        }
      }
      consumer {
        name: "BN_1"
        attr {
          name: "type"
          type: STRING
          s: "FusedBatchNormV3"
        }
        attr {
          name: "begin"
          type: INT
          i: 0
        }
        attr {
          name: "end"
          type: INT
          i: 64
        }
      }
    }
  • Structured sparsity record file. The following is an example:
    prune_record {
      selective_prune {
        name: "conv1"
        attr {
          name: "mask_shape"
          type: INTS
          ints: 3
          ints: 3
          ints: 3
          ints: 32
        }
      }
    }