Workflow

This section describes the supported quantization layers of QAT, and API call sequence and example.

For the quantization example, see Sample List. The layers that support QAT as well as their restrictions are listed as follows.

**Table 1** Layers that support QAT as well as their restrictions
Supported Layer Type	Restriction
InnerProduct	transpose = false, axis = 1
Convolution	4 × 4 filter
Deconvolution	1-dilated 4 × 4 filter with group = 1
AVE Pooling	Global Pooling is not supported.

API Call Sequence

Figure 1 shows the API call sequence of QAT. The training environment uses the CPU/GPU environment of the Caffe framework. Based on the inference script of the open-source framework, the AMCT API is called to compress the model. The compressed model needs to be converted into an offline model that adapts to the Ascend AI Processor using the ATC before it can be used for inference on the Ascend AI Processor.

Figure 1 API call sequence for QAT

The user implements the operations in blue, while those in gray are implemented by using AMCT APIs. Specifically, import the package to the source Caffe network inference code and call APIs where appropriate for quantization.

Build an original Caffe model and then generate a quantization configuration file by using the create_quant_retrain_config API.
Using the create_quant_retrain_model API, optimize the original Caffe model by inserting activation and weight quantization operators for quantization parameter calculation. Run model retraining on the test and calibration datasets provided by AMCT in the Caffe environment to obtain the quantization factors.
Start the Caffe training process and configure the solver. Add the test phase to the training process and set the number of test iterations to a value greater than batch_num in the quantization configuration file.
Using the save_quant_retrain_model API, insert operators including AscendQuant and AscendDequant and save the quantized model file (including its weight file) that is either suitable for accuracy simulation in the Caffe environment or deployable on the Ascend AI Processor.

Example

Suggestions on the training process in the QAT scenario:
Post-quantization training uses the same configuration as the initial training process except the following differences:
- Epoch number: 1/4–1/3 of the initial epoch number.
- Learning rate: 1/100 of the initial learning rate.
Take the following steps to get started. Update the sample code based on your situation.
Tweak the arguments passed to AMCT API calls as required. QAT relies on the user training result. Ensure that a Caffe training script that yields satisfactory training accuracy is available.

Import the AMCT package and set the log level (see Setting Environment Variables for details).
1
import amct_caffe as amct
Set the run mode and target device.
AMCT runs on the CPU or GPU. To run the tool on the GPU, you first need to configure the Caffe run mode and target device before configuring AMCT's run mode. Since the target device has already been specified here, you do not need to configure the target device in the model inference function.
1 2 3 4 5 6
if 'gpu': caffe.set_mod_gpu() caffe.set_device(gpu_id) amct.set_gpu_mode() else: caffe.set_mode_cpu()
(Optional) Run inference on the original model in the Caffe environment based on the test dataset to validate the inference script and environment setup. (Update the sample code based on your situation.)
This step is recommended as it guarantees a properly functioning original model for inference with acceptable accuracy. You can use a subset from the test dataset to improve the efficiency.
1
user_test_model(ori_model_file, ori_weights_file, test_data, test_iterations)

Run AMCT to quantize the model.

Generate a quantization configuration file.

config_file = './tmp/config.json'
amct_caffe.create_quant_retrain_config(config_file=config_file, 
                                       model_file=ori_model_file,
                                       weights_file=ori_weights_file)

Modify the model by inserting fake-quantization layers, and save the new model file.

Modify the model based on the quantization configuration file and insert activation and weight quantization operators to calculate quantization parameters.

modified_model_file = './tmp/modified_model.prototxt'
modified_weights_file = './tmp/modified_model.caffemodel'
scale_offset_record_file = './tmp/record.txt'
amct_caffe.create_quant_retrain_model(model_file=ori_model_file,
				      weights_file=ori_weights_file,
                                      config_file=config_file,
                                      modified_model_file=modified_model_file,
				      modified_weights_file=modified_weights_file,
                                      scale_offset_record_file=scale_offset_record_file)

Implement gradient descent optimization on the modified model, train the model on the training dataset, and calculate quantization factors. (Update the sample code based on your situation.)
1. Add a test phase (test_interval > 0, test_iter > 0) to solver.prototxt to enable the search for the shift factor N in the test phase and turn precheck off (test_initialization = false) to prevent triggering an unintentional search for the shift factor N.
  The following provides a template of the solver.prototxt file.
```
test_iter: 1
test_interval: 4
base_lr: 9.999999747378752e-05
max_iter: 4
lr_policy: "step"
gamma: 0.10000000149011612
momentum: 0.8999999761581421
weight_decay: 9.999999747378752e-05
stepsize: 10
snapshot: 4
net: "$HOME/amct_path/sample/resnet50/tmp/modified_model.prototxt"
test_initialization: false
```
  - test_iter: (repeated) number of test iterations. Must be greater than or equal to batch_num of the shift factor N. Otherwise, the calculation of shift factor N fails due to data insufficiency. test_iter × batch_size equals the number of images in each test.
  - test_interval: interval between tests (in training iterations). Defaults to 0. You are advised to set this parameter to the max_iter factor (test_interval==max_iter in the sample). That is, only one test is performed after the training phase.
  - max_iter: maximum number of training iterations.
  - net: Caffe model to train. This model will be reused in the training phase and test phase. The operators to run in each phase are specified by using the phase field of each operator. Alternatively, you can specify the model to train and the model to test by using train_net and test_net, respectively. As AMCT has generated only one model, net is used here.
  - test_initialization: a bool. If it is set to True (default), the original model will be prechecked before training. In this case, however, the parameters will be all initialized to 0 resulting in a calculation error of the shift factor N. To prevent this, set test_initialization to False.
  - The base_lr, lr_policy, and gamma parameters control the learning strategy, that is, how learning_rate changes.
  - momentum: previous weight update.
  - weight_decay: weight decay, which prevents overfitting.
  - snapshot: snapshot that saves the trained model and solver status. Set it to the number of training epochs before the data is saved.
2. Train the model.
  1
  user_train_model(modified_model_file, modified_weights_file, train_data)
  During training, the activation quantization operator is trained to obtain the quantization upper and lower bounds clip_max and clip_min, which are saved to the operator BLOB. The weight quantization operator learns quantization parameters and saves the updated parameters to the model. Quantization factors are generated after batch_num training. If the number of training times is less than batch_num, the training fails.

Save the model.

Call the save_quant_retrain_model API to insert operators such as AscendQuant and AscendDequant and save the resultant deployable model and fake-quantized model based on the quantization factors and the retrained model.

quant_model_path = './result/user_model'
amct.save_quant_retrain_model(retrained_model_file=modified_model_file,
                              retrained_weights_file=modified_weights_file,
                              save_type='Both',
                              save_path=quant_model_path,
                              scale_offset_record_file=scale_offset_record_file,
                              config_file=config_file)

(Optional) Run inference on the fake-quantized models fake_quant_model and fake_quant_weights in the Caffe environment based on the test dataset to test the accuracy. (Update the sample code based on your situation.)

Compare the accuracy of the fake-quantized model with that of the original model (see 3).

fake_quant_model = './result/user_model_fake_quant_model.prototxt'
fake_quant_weights = './result/user_model_fake_quant_weights.caffemodel'
user_test_model(fake_quant_model, fake_quant_weights, test_data, test_iterations)

Parent topic: QAT