Record Files

A record file is a serialized data structure file based on Protobuf. It records the quantization scenarios and the scale and offset factors for quantization. You can generate a compressed model file by using the record file, quantization configuration file, and original network model file.

Prototype Definition of the Quantization Record File

The Protobuf prototype of the PTQ record file is defined as follows (find it in the /amct_tensorflow/proto/inner_scale_offset_record.proto file under the AMCT installation directory):

syntax = "proto2";
import "amct_tensorflow/proto/basic_info.proto";
package AMCTTensorflow;

// this proto is designed for amct tools
message InnerSingleLayerRecord {
    optional float scale_d = 1;
    optional int32 offset_d = 2;
    repeated float scale_w = 3;
    repeated int32 offset_w = 4;
    repeated uint32 shift_bit = 5;
    // the cluster of nuq, only nuq layer has this field;
    repeated int32 cluster = 6;
    optional bool skip_fusion = 9 [default = false];
    optional string dst_type = 10 [default = 'INT8'];
    repeated string prune_producer = 11;
    repeated string prune_consumer = 12;
    repeated float tensor_balance_factor = 13;
    optional string act_type = 14 [default = 'INT8'];
    optional string wts_type = 15 [default = 'INT8'];
}

message InnerMapFiledEntry {
    optional string key = 1;
    optional InnerSingleLayerRecord value = 2;
}

message InnerScaleOffsetRecord {
    repeated InnerMapFiledEntry record = 1;
    repeated PruneRecord prune_record = 2;
}

message PruneRecord {
    repeated PruneNode producer = 1;
    repeated PruneNode consumer = 2;
    optional PruneNode selective_prune = 3;
}

message PruneNode {
    required string name = 1;
    repeated AMCTProto.AttrProto attr = 2;
}

The parameters in this scenario are described as follows.

Message	Required	Type	Field	Description
InnerSingleLayerRecord	-	-	-	Quantization factors.
	optional	float	scale_d	Scale factor for activation quantization. Only unified activation quantization is supported.
	optional	int32	offset_d	Offset factor for activation quantization; only unified activation quantization is supported.
	repeated	float	scale_w	Scale factor for weight quantization. Two quantization modes are supported: scalar (uniformly quantizing the weight of the current layer) and vector (quantizing the weight of the current layer channel-wise). Channel-wise quantization applies only to the Conv2D, DepthwiseConv2dNative, and Conv2DBackpropInput layers.
	repeated	int32	offset_w	Offset factor for weight quantization. Similar to scale_w, it also supports scalar and vector modes and the dimension configuration must be the same as that of scale_w. Currently, weight quantization with offset is not supported, and offset_w must be 0.
	repeated	uint32	shift_bit	Shift factor. Reserved field, which is not supported currently and does not need to be configured.
	repeated	int32	cluster	Cluster center. Required only in the NUQ scenario. This field is not supported currently.
	optional	bool	skip_fusion	Whether to skip Conv+BN fusion, Depthwise_Conv+BN fusion, Group_conv+BN fusion, and BatchNorm fusion at the current layer. Defaults to false, indicating performing the preceding fusion types.
	optional	string	dst_type	Quantization bit width, either INT8 or INT4. This field is not supported in this version.
InnerScaleOffsetRecord	-	-	-	Map structure. The discrete map structure is used to ensure compatibility.
InnerScaleOffsetRecord	repeated	InnerMapFiledEntry	record	Quantization factor record per layer, consisting of two members: key: layer name. value: quantization factors defined by SingleLayerRecord.
InnerMapFiledEntry	optional	string	key	Layer name.
InnerMapFiledEntry	optional	InnerSingleLayerRecord	value	Quantization factor configuration.
PruneRecord	-	-	-	Sparsity records. This feature is not supported.
	repeated	PruneNode	producer	Sparsity producer, which is the root node of the cascade correlations between sparsifiable nodes. For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2.
	repeated	PruneNode	consumer	Sparsity consumer, which is the downstream node of the cascade correlations between sparsifiable nodes. For example, the composite of conv1>bn>relu>conv2 is sparsifiable, and bn, relu, and conv2 will be affected by the sparsity of conv1. In this example, bn, relu, and conv2 are consumers of conv1, and conv1 is the producer of bn, relu, and conv2.
	optional	PruneNode	selective_prune	2:4 structured sparsity node. Due to hardware restrictions, the Atlas 200/300/500 Inference Product, and Atlas Training Series Product do not support the 2:4 structured sparsity feature. Enabling this feature obtains few performance benefits.
PruneNode	-	-	-	Node to be sparsified. This feature is not supported.
	required	string	name	Node name.
	repeated	AMCTProto.AttrProto	attr	Node attributes.

Beware that the Protobuf protocol does not report an error if you have set optional fields more than once. As such, the most recent settings are used.

Record Files

The format of a generated record file is record.txt.

For general quantization layers, parameters scale_d, offset_d, scale_w, and offset_w need to be configured. The scale_w and offset_w parameters are unavailable for AvgPool because the layer has no weight. The quantization factor record file corresponding to inner_scale_offset_record.proto is provided as an example below.

record {
  key: "fc4/Tensordot/MatMul"
  value {
    scale_d: 0.0798481479
    offset_d: 1
    scale_w: 0.00297622895
    offset_w: 0
  }
}
record {
  key: "depthwise"
  value {
    scale_d: 0.00962011795
    offset_d: 1
    scale_w: 0.00787108205
    scale_w: 0.00787108205
    scale_w: 0.00787108205
    offset_w: 0
    offset_w: 0
    offset_w: 0
    skip_fusion: true
  }
}
record {
  key: "conv2d/Conv2D"
  value {
    scale_d: 0.00392156886
    offset_d: -128
    scale_w: 0.00106807391
    scale_w: 0.00104224426
    scale_w: 0.0010603976
    offset_w: 0
    offset_w: 0
    offset_w: 0
  }
}

Parent topic: See Also