Record Files
A record file is a serialized data structure file based on Protobuf. It records the scale and offset factors for quantization and the cascade correlations between pruned layers. You can generate a compressed model file by using the record file, compression configuration file, and original network model file.
Prototype Definition of the convert_model API Record File
syntax = "proto2";
package AMCTTensorflow;
// this proto is designed for convert_model API
message SingleLayerRecord {
optional float scale_d = 1;
optional int32 offset_d = 2;
repeated float scale_w = 3;
repeated int32 offset_w = 4;
// convert_model does not support this field [shift_bit] yet
repeated uint32 shift_bit = 5;
optional bool skip_fusion = 9 [default = false];
optional string act_type = 14 [default = 'INT8'];
optional string wts_type = 15 [default = 'INT8'];
}
message MapFiledEntry {
optional string key = 1;
optional SingleLayerRecord value = 2;
}
message ScaleOffsetRecord {
repeated MapFiledEntry record = 1;
}
The parameters in this scenario are described as follows.
Message |
Required |
Type |
Parameter |
Description |
|---|---|---|---|---|
SingleLayerRecord |
- |
- |
- |
Quantization factors. |
optional |
float |
scale_d |
Scale factor for activation quantization. Only unified activation quantization is supported. |
|
optional |
int32 |
offset_d |
Offset factor for activation quantization; only unified activation quantization is supported. |
|
repeated |
float |
scale_w |
Scale factor for weight quantization. Two quantization modes are supported: scalar (uniformly quantizing the weight of the current layer) and vector (quantizing the weight of the current layer channel-wise). Channel-wise quantization applies only to the Conv2D, DepthwiseConv2dNative, and Conv2DBackpropInput layers. |
|
repeated |
int32 |
offset_w |
Offset factor for weight quantization. Similar to scale_w, it also supports scalar and vector modes and the dimension configuration must be the same as that of scale_w. Currently, weight quantization with offset is not supported, and offset_w must be 0. |
|
repeated |
uint32 |
shift_bit |
Shift factor. Reserved for the convert_model API. |
|
optional |
bool |
skip_fusion |
Whether to skip Conv+BN fusion, Depthwise_Conv+BN fusion, Group_conv+BN fusion, and BatchNorm fusion at the current layer. Defaults to false, indicating performing the preceding fusion types. |
|
optional |
string |
act_type |
Data quantization bit width. Currently, only INT8 is supported. |
|
optional |
string |
wts_type |
Weight quantization bit width. Currently, only INT8 is supported. |
|
ScaleOffsetRecord |
- |
- |
- |
Map structure. The discrete map structure is used to ensure compatibility. |
repeated |
MapFiledEntry |
record |
Quantization factor record per layer, consisting of two members:
|
|
MapFiledEntry |
optional |
string |
key |
Layer name. |
optional |
SingleLayerRecord |
value |
Quantization factor configuration. |
Beware that the Protobuf protocol does not report an error if you have updated optional fields more than once. As such, the most recent settings are used.
Prototype Definition of the Quantization or Sparsity Record File
syntax = "proto2";
import "amct_tensorflow/proto/basic_info.proto";
package AMCTTensorflow;
// this proto is designed for amct tools
message InnerSingleLayerRecord {
optional float scale_d = 1;
optional int32 offset_d = 2;
repeated float scale_w = 3;
repeated int32 offset_w = 4;
repeated uint32 shift_bit = 5;
// the cluster of nuq, only nuq layer has this field;
repeated int32 cluster = 6;
optional bool skip_fusion = 9 [default = false];
optional string dst_type = 10 [default = 'INT8'];
repeated string prune_producer = 11;
repeated string prune_consumer = 12;
repeated float tensor_balance_factor = 13;
optional string act_type = 14 [default = 'INT8'];
optional string wts_type = 15 [default = 'INT8'];
}
message InnerMapFiledEntry {
optional string key = 1;
optional InnerSingleLayerRecord value = 2;
}
message InnerScaleOffsetRecord {
repeated InnerMapFiledEntry record = 1;
repeated PruneRecord prune_record = 2;
}
message PruneRecord {
repeated PruneNode producer = 1;
repeated PruneNode consumer = 2;
optional PruneNode selective_prune = 3;
}
message PruneNode {
required string name = 1;
repeated AMCTProto.AttrProto attr = 2;
}
The parameters in this scenario are described as follows.
Message |
Required |
Type |
Parameter |
Description |
|---|---|---|---|---|
InnerSingleLayerRecord |
- |
- |
- |
Quantization factors. |
optional |
float |
scale_d |
Scale factor for activation quantization. Only unified activation quantization is supported. |
|
optional |
int32 |
offset_d |
Offset factor for activation quantization; only unified activation quantization is supported. |
|
repeated |
float |
scale_w |
Scale factor for weight quantization. Two quantization modes are supported: scalar (uniformly quantizing the weight of the current layer) and vector (quantizing the weight of the current layer channel-wise). Channel-wise quantization applies only to the Conv2D, DepthwiseConv2dNative, and Conv2DBackpropInput layers. |
|
repeated |
int32 |
offset_w |
Offset factor for weight quantization. Similar to scale_w, it also supports scalar and vector modes and the dimension configuration must be the same as that of scale_w. Currently, weight quantization with offset is not supported, and offset_w must be 0. |
|
repeated |
uint32 |
shift_bit |
Shift factor. shift_bit is written to the record file only when joint_quant is configured in Simplified PTQ Configuration File. |
|
repeated |
int32 |
cluster |
Cluster center. Required only in the NUQ scenario. This field is not supported. |
|
optional |
bool |
skip_fusion |
Whether to skip Conv+BN fusion, Depthwise_Conv+BN fusion, Group_conv+BN fusion, and BatchNorm fusion at the current layer. Defaults to false, indicating performing the preceding fusion types. |
|
optional |
string |
dst_type |
Quantization bit width, either INT8 or INT4. This field is used only for QAT. |
|
repeated |
string |
prune_producer |
Sparsity producer, which is the root node of the cascade correlations between sparsifiable nodes. This field is used only in sparsity scenarios. |
|
repeated |
string |
prune_consumer |
Sparsity consumer, which is the downstream node of the cascade correlations between sparsifiable nodes. This field is used only in sparsity scenarios. |
|
repeated |
float |
tensor_balance_factor |
Balanced quantization factor. This field is used only in pre-balancing activation quantization. |
|
optional |
string |
act_type |
Activation quantization bit width: INT8 or INT16. This field is used only in PTQ. Currently, only INT8 quantization is supported. |
|
optional |
string |
wts_type |
Weight quantization bit width. This field is used only in PTQ scenarios. Currently, the quantization factors after INT6 and INT7 quantization are still saved as the INT8 type. |
|
InnerScaleOffsetRecord |
- |
- |
- |
Map structure. The discrete map structure is used to ensure compatibility. |
repeated |
InnerMapFiledEntry |
record |
Quantization factor record per layer, consisting of two members:
|
|
repeated |
PruneRecord |
prune_record |
Sparsity records. |
|
InnerMapFiledEntry |
optional |
string |
key |
Layer name. |
optional |
InnerSingleLayerRecord |
value |
Quantization factor configuration. |
|
PruneRecord |
- |
- |
- |
Sparsity records. |
repeated |
PruneNode |
producer |
Sparsity producer, which is the root node of the cascade correlations between sparsifiable nodes. For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2. |
|
repeated |
PruneNode |
consumer |
Sparsity consumer, which is the downstream node of the cascade correlations between sparsifiable nodes. For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2. |
|
optional |
PruneNode |
selective_prune |
2:4 structured sparsity node. Due to hardware restrictions, the Atlas 200/300/500 Inference Product, and Atlas Training Series Product do not support the 2:4 structured sparsity feature. Enabling this feature obtains few performance benefits. |
|
PruneNode |
- |
- |
- |
Node to be sparsified. |
required |
string |
name |
Node name. |
|
repeated |
AMCTProto.AttrProto |
attr |
Node attributes. |
Beware that the Protobuf protocol does not report an error if you have set optional fields more than once. As such, the most recent settings are used.
Record Files
The format of a generated record file is record.txt. According to different features, record files are classified into:
- PTQ record file
record { key: "conv2d/Conv2D" value { scale_d: 1.541161146e-05 offset_d: -32768 scale_w: 0.007854792 scale_w: 0.0077705383 offset_w: 0 offset_w: 0 shift_bit: 12 // The shift_bit information is recorded in the record file only when the joint_quant parameter is set for Simplified PTQ Configuration File. shift_bit: 13 act_type: "INT8" wts_type: "INT8" } } - QAT record file
For general quantization layers, parameters scale_d, offset_d, scale_w, offset_w and shift_bit need to be configured. The scale_w and offset_w parameters are unavailable for AvgPool because the layer has no weight. The quantization factor record file corresponding to inner_scale_offset_record.proto is provided as an example below.
record { key: "fc4/Tensordot/MatMul" value { scale_d: 0.0798481479 offset_d: 1 scale_w: 0.00297622895 offset_w: 0 shift_bit: 1 dst_type: "INT8" } } record { key: "conv2d/Conv2D" value { scale_d: 0.00392156886 offset_d: -128 scale_w: 0.00106807391 scale_w: 0.00104224426 offset_w: 0 offset_w: 0 shift_bit: 1 shift_bit: 1 dst_type: "INT8" } } - Activation quantization balance preprocessing record file. The following is an example:
record { key: "matmul_1" value { scale_d: 0.00784554612 offset_d: -1 scale_w: 0.00778095098 offset_w: 0 shift_bit: 2 tensor_balance_factor: 0.948409557 tensor_balance_factor: 0.984379828 } } record { key: "conv_1" value { scale_d: 0.00759239076 offset_d: -4 scale_w: 0.0075149606 offset_w: 0 shift_bit: 1 tensor_balance_factor: 1.04744744 tensor_balance_factor: 1.44586647 } } - Filter-level sparsity record file, which records the cascade correlations between sparse layers. The following is an example:
prune_record { producer { name: "conv_1" attr { name: "type" type: STRING s: "Conv2D" } attr { name: "begin" type: INT i: 0 } attr { name: "end" type: INT i: 64 } } consumer { name: "BN_1" attr { name: "type" type: STRING s: "FusedBatchNormV3" } attr { name: "begin" type: INT i: 0 } attr { name: "end" type: INT i: 64 } } } - Structured sparsity record file. The following is an example:
prune_record { selective_prune { name: "conv2d/Conv2D" attr { name: "mask_shape" type: INTS ints: 3 ints: 3 ints: 3 ints: 32 } } }