Accuracy Problem Diagnosis Process

If an accuracy problem occurs after the accuracy comparison is complete, analyze the problem and find a solution. Figure 1 uses a Caffe inference application scenario as an example to illustrate how to analyze accuracy problems.

Figure 1 Process of accuracy problem analysis

The detailed operations in Figure 1 are described as follows:

Check whether the input data of the first data layer is correct. Find an image with an accuracy problem and use it for compilation, instruction simulation, and Caffe-based inference. Compare the similarity of simulation and each layer of Caffe-based inference. Then, locate the problem based on the sequence in the preceding figure.
Check whether the similarity of the data layer is greater than or equal to 0.999.
- Yes: Go to the next step.
- No: Check whether the inputs at the data layer are consistent. Check whether the mean value [mean_file], scaling [data_scale], and preprocessing mode [norm_type] are the same as those of Caffe. For MxNet and DarkNet (YOLO) network training, RGB is used by default. Therefore, you need to set [RGB--order] to RGB during model conversion.
Check whether the similarity at the data layer is greater than 0.99, but it decreases gradually and is less than 0.95 at the last layer.
- Yes: Check whether the problem is caused by quantization errors.
  Modify the configuration options of ATC, set [dump_data] to 1, and output the calibrated data to the mapper_quant directory.
  
  Set [forward_quantization_option] to 1, including that only activation quantization is performed. Compare the similarity between mapper_quant and caffe. If the similarity meets the requirement, the problem is caused by a weight quantization error.
  
  Set [forward_quantization_option] to 2, indicating that only weight quantization is performed. Compare the similarity between mapper_quant and caffe. If the similarity meets the requirement, the problem is caused by an activation quantization error.
  - Yes: Use the AMCT tool to perform calibration or retraining.
  - No: Report the problem to the technical support.
- No: Go to the next step.
Check whether the similarity at the last layer is 0.99 and that of some middle layers is less than 0.90.
- Yes: Check the layer matching.
  Use ATC to optimize the network structure to adapt to hardware execution. Therefore, the layers may not match Caffe during similarity comparison.
  1. Check whether the Inplace mode is used, in which the top name is the same as the bottom name. Some layers do not support Inplace and need to be partitioned. Take conv + tanh as an example. It supports Inplace on Caffe and outputs only the tanh data. ATC does not support the fusion of conv and tanh and outputs conv and tanh data separately.
  2. Check whether the network is modified by ATC. Compare the cnn_net_tree.dot file (generated during ATC compilation) with the original .prototxt file to check whether the network structure is modified. For example, the SPP layer is partitioned into Pooling and Concat. Therefore, you need to compare the result with the Concat result or directly observe the similarity of subsequent layers (that is, perform the next step).
  3. If the layers are matched but the similarity is low, report the problem to the technical support.
- No: Go to the next step.
If the similarity of all layers is greater than 0.99 and the absolute error is small, the postprocessing may be abnormal. Check whether the postprocessing is normal.
Assume that the Caffe result is framed or classified by Caffe postprocessing. Use Caffe postprocessing to process the simulation result and check whether the result can be framed or classified properly.

Normal: The board postprocessing is abnormal. Compare the postprocessing code of the board with that of Caffe.

Abnormal: If the data similarity is 0.99 and the absolute error is small, the Caffe postprocessing code is sensitive to data. Please check.
Report the problem to the technical support.
Provide the following information to the technical support:
- For Caffe, return the .prototxt and .caffemodel files. For PyTorch, return the ONNX model, .py definition, and .pth file. If the model cannot be provided, provide the .prototxt, .py, .pth, weights, and input/output data corresponding to the problem layer.
- Provide the parameters, images, and mean value files used for compilation.
- Provide the ATC version number printed during compilation, for example, Mapper Version 1.0.0.0_B010(PICO_1.0) 2110161033840ed952(CPU)(INST_2.0.9).

Parent topic: Tensor Comparison