"Check scale and offset record file record.txt failed" Is Displayed During Quantization

Symptom

When the save_model API is called to save the quantization model during quantization, the activation quantization parameters scale_d and offset_d computed in the calibration phase need to be read. If the parameters cannot be found in the corresponding record file, the quantization model fails to be saved. Therefore, the preceding AMCT error is thrown and the process is terminated Symptoms of this problem are as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
2021-05-11 09:29:18,138 - INFO - [AMCT]:[Optimizer]: Do <class 'amct_caffe.optimizer.check_record_scale_offset.CheckRecordScaleOffsetPass'>
2021-05-11 09:29:18,138 - ERROR - [AMCT]:[AMCT]: Cannot find scale_d,offset_d,channels,height,width of layer:Convolution1
2021-05-11 09:29:18,138 - ERROR - [AMCT]:[AMCT]: Check scale and offset record file xx/data/AMCT_CAFFE_GPU_GPU_FAQ/AMCT_CAFFE_GPU_GPU_FAQ_record.txt failed
2021-05-11 09:29:18,138 - ERROR - [AMCT]:[AMCT]: There may be something wrong while doing calibration
2021-05-11 09:29:18,138 - ERROR - [AMCT]:[AMCT]: Please check caffe log
Traceback (most recent call last):
  File "test_caffe_faq.py", line 32, in <module>
    main()
  File "test_caffe_faq.py", line 28, in main
    gen_amct_model(case,json_flag = True) 
  File "../caffe_lib/calibration_test/code/gen_amct_model.py", line 102, in gen_amct_model
    save_model(graph, mode, AMCT_PATH+case_name)
  File "xx/amct/lib/python3.7/site-packages/amct_caffe/common/utils/check_params.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "xx/amct/lib/python3.7/site-packages/amct_caffe/quantize_tool.py", line 190, in save_model
    optimizer.do_optimizer(graph)
  File "xx/amct/lib/python3.7/site-packages/amct_caffe/optimizer/graph_optimizer.py", line 60, in do_optimizer
    graph_pass.run(graph)
  File "xx/amct/lib/python3.7/site-packages/amct_caffe/optimizer/check_record_scale_offset.py", line 103, in run
    raise RuntimeError('Check file {} failed.'.format(record_file))
RuntimeError: Check file xx/data/AMCT15_CAFFE_GPU_GPU_FAQ/AMCT15_CAFFE_GPU_GPU_FAQ_record.txt failed.

Possible Cause

The scale_d and offset_d parameters are saved at the IFMR layer inserted into the calibration model when the user performs the calibration operation (when the Caffe framework is called to perform forward computation on the calibration model). However, the IFMR layer needs to save the data specified by batch_num and then perform quantization calculation to obtain scale_d and offset_d. The possible causes of the RuntimeError: Check file record.txt failed error are as follows:

  1. Caffe inference error: The possible causes are as follows: The compiled Caffe is faulty, the calibration model is faulty, or the corresponding dataset cannot be found. You can view the exception information thrown by the Caffe framework.
  2. The data volume of the calibration set provided by the user does not meet the data volume required by batch_num. For example, if the user provides only one batch of data as the calibration set but sets batch_num to 2, the IFMR layer does not have sufficient data to perform quantization during calibration. As a result, scale_d and offset_d cannot be calculated, the preceding error is also triggered. You can view the process information printed during the quantization of the IFMR layer to locate the fault. The IFMR layer displays the accumulated data volume.

    When the specified amount of data is accumulated, the quantization operation is triggered.

    The quantization of the current layer is complete only when the Do layer:"conv1" activation calibration success! information is displayed.

Solution

  1. Rectify the fault based on the error reported by the Caffe framework.
  2. Increase the data volume of the calibration set or reduce the value of batch_num of the quantization algorithm until the data volume of the calibration set is greater than or equal to the value of batch_num. However, reducing the value of batch_num may reduce the model precision after quantization. Therefore, exercise caution when reducing the value of batch_num.