QAT Model Adaptation to CANN Format

An already-quantized source TensorFlow model is referred to as a QAT model. Prior to generating an offline model adapted to Ascend AI Processor with ATC, you need to use the function provided in this section to convert the QAT model into the CANN format, and then use the ATC tool to convert the CANN quantization model into an adapted offline model.

Note the following restrictions:

The source QAT model must have FakeQuant layers, including FakeQuantWithMinMaxVars and FakeQuantWithMinMaxVarsPerChannel (weights only).
Only the Conv2D, MatMul, DepthwiseConv2dNative, Conv2DBackpropInput, and AvgPool layers can match fake_quant nodes, which means only these layers are adaptable. Uniform Quantization shows the layer restrictions.

Adaptation Principles

Figure API call sequence for uniform quantization shows the adaptation principles. The user implements the operations in blue, while those in gray are implemented by using AMCT's convert_model API. Specifically, import the package to the TensorFlow network inference code and call the API where appropriate for adaptation. QAT models do not support Automatic Quantization. For the adaptation example, see Sample List.

Figure 1 QAT model adaptation to Ascend format

Examples

This example details how to use AMCT to convert a TensorFlow QAT model to a CANN representation.

Take the following steps to get started. Update the sample code based on your situation.
Tweak the arguments passed to AMCT API calls as required.

Import the AMCT package and set the log level.

import amct_tensorflow as amct
amct.set_logging_level(print_level='info', save_level='info')

(Optional) Run inference on the source model in the TensorFlow environment based on the test dataset to validate the inference script and environment setup. (Update the sample code based on your situation.)
This step is recommended as it guarantees a properly functioning source model for inference with acceptable accuracy. You can use a subset from the test dataset to improve the efficiency.
1
user_do_inference(ori_qat_model, test_data)

Call AMCT's convert_qat_model API.

This API parses the model in .pb format into a graph, preprocesses the graph, modifies the parsed graph structure, inserts operators such as AscendQuant and AscendDequant, and saves the quantized model.

quant_model_path = './result/user_model'
record_file = './result/record.txt'
amct.convert_qat_model(pb_model=ori_qat_model,
		       outputs=ori_qat_model_outputs,
		       save_path=quant_model_path,
                       record_file=record_file)

(Optional) Run inference on the fake-quantized model user_model_quantized.pb in the TensorFlow environment based on the test dataset to test the accuracy. (Update the sample code based on your situation.)
Check the accuracy loss of the fake-quantized model by comparing with that of the source model (see 2).
1 2
quant_model = './results/user_model_quantized.pb' user_do_inference(quant_model, test_data)

Parent topic: Model Adaptation