Overview

Check the Comparison Result Description for this scenario.

In Caffe scenarios, accuracy comparison of both quantized and non-quantized models is supported.

In non-quantization scenarios, prepare the input data listed in the following table.

**Table 1** Input data requirements for comparison between a non-quantized original model and a non-quantized offline model
File	Description	How to Obtain
.npy file of the non-quantized original model	Benchmark data	Preparing .npy Data of a Caffe Model
.json file generated by converting the offline model file using ATC	Operator mapping obtaining	Network-wide Information File
Dump data file of the non-quantized offline model running on the Ascend AI Processor	Data to be compared	In the offline inference scenario, the methods for obtaining the dump data of the NPU environment are the same for different frameworks. For details, see the following: Preparing Dump Data of an Offline Model

In quantization scenarios, to analyze the accuracy issues arising in quantization and model conversion in detail, perform the following steps for comparison:
- The quantization process reduces the precision of model weights and activations to make the model lighter, improving the compute efficiency and lowering the transfer latency. For details, see the AMCT Instructions.
- AMCT will insert quant and dequant operators when quantizing the original model, while ATC will fuse the inserted operators during model conversion. In this case, the comparison between the dump result of the quantized model and that of the original model may be inaccurate. Therefore, if you want to use the AMCT-quantized model for accuracy comparison, pass --fusion_switch_file to disable some fusion functions and perform operations in Comparison Between NPU and NPU (Offline Inference) after accuracy comparison to check whether accuracy differences exist before and after fusion.
In offline inference scenarios of Caffe, model quantization is utilized to enhance performance, but it can lead to accuracy degradation. Typically, a certain degree of accuracy degradation is acceptable to achieve improved performance, usually within a range of 1%. However, if the error exceeds this range, it is necessary to locate the root cause and address it accordingly. The possible causes for large accuracy errors are as follows:
- The quantization process causes accuracy degradation.
- An accuracy issue arises when the quantized offline model, which is converted from the quantized original model through ATC, runs on the NPU due to changes in operators.
Based on the above causes, you need to locate an accuracy issue step by step.
1. Locate the accuracy issue in the quantization process.
  Compare the accuracy of a non-quantized original model (GPU/CPU) with that of a quantized original model (GPU/CPU).
  1. Refer to Preparing .npy Data of a Caffe Model to obtain the dump data files of the non-quantized original model files resnet50.prototxt and resnet50.caffemodel.
  2. Refer to Quantized Original Model and Quantization Information File to obtain the quantized original model files resnet50_deploy_model.prototxt, resnet50_deploy_weights.caffemodel, resnet50_fake_quant_model.prototxt, and resnet50_fake_quant_weights.caffemodel, and the quantization information file resnet50_quant.json.
  3. Refer to Preparing .npy Data of a Caffe Model to obtain the dump data files of the quantized original model files resnet50_fake_quant_model.prototxt and resnet50_fake_quant_weights.caffemodel.
  4. Refer to Comparison Operation and Analysis to compare the accuracy of the non-quantized original model with that of the quantized original model.
  5. After the accuracy comparison is complete, check the accuracy error of the quantized model. Generally, this error is determined by the top N accuracy of a batch of data of the model, such as the accuracy error of 10,000 to 100,000 images. If it is within 1%, it is basically reasonable and does not require further investigation. If it exceeds 1% or even reaches 3%, the accuracy issue caused by quantization cannot be ignored.
  6. Check whether the quantization process is properly designed. For example, you need to check whether the selected quantization dataset is representative and meets the business needs, and whether the data volume is too small or whether it falls within the normal range of 10 to 100.
  7. If the accuracy error in quantization is large, you are advised to manually dequantize the input and output layers of the model, which can generally improve accuracy to some extent.
2. Locate the accuracy issue arising in model conversion, specifically the accuracy issue of the quantized offline model when running on the NPU.
  Compare the accuracy of a quantized original model (GPU/CPU) with that of a quantized offline model (NPU, fusion pattern disabled).
  
  Operator fusion is enabled for ATC-based model conversion by default. Therefore, to eliminate the inability to directly compare the accuracy of fused operators, disable operator fusion before performing model conversion.
  1. Disable operator fusion and perform ATC-based model conversion to obtain the quantized offline model (fusion pattern disabled).
```
atc --model=$HOME/module/resnet50_deploy_model.prototxt --weight=$HOME/module/resnet50_deploy_weights.caffemodel --framework=0 --output=$HOME/module/out/caffe_resnet50_off --soc_version=<soc_version>  --fusion_switch_file=$HOME/module/fusion_switch.cfg
```
    To disable operator fusion, use the --fusion_switch_file option to specify the operator fusion pattern configuration file (for example, fusion_switch.cfg) and disable operator fusion in the file. The configuration for disabling operator fusion in the operator fusion pattern configuration file is as follows:
    
    { "Switch":{ "GraphFusion":{ "ALL":"off" }, "UBFusion":{ "ALL":"off" } } }
    You should see information similar to the following if the conversion is successful.
    1
    ATC run success
    After the command is successfully executed, an offline model (for example, caffe_resnet50.om) is generated in the $HOME/module/out/caffe_resnet50_off directory.
  2. Generate a .json file.
```
atc --mode=1 --om=$HOME/module/out/caffe_resnet50_off/caffe_resnet50.om --json=$HOME/data/resnet50.json
```
  3. Refer to Preparing Dump Data of an Offline Model to obtain the dump data file of the quantized offline model caffe_resnet50.om (fusion pattern disabled).
  4. Refer to Comparison Operation and Analysis to compare the accuracy of the quantized original model with that of the quantized offline model (fusion pattern disabled).
  5. If the accuracy error in model conversion is large, determine whether the error is an accumulated error or a node error based on the comparison result. If the error is large on a specific node: After obtaining the logs, click here to contact technical support. To avoid this error, you are advised to manually dequantize the node and then quantize the model again. If it is an accumulated error, you can check whether it is caused by quantization. Due to the accuracy loss in quantization, there may be small accuracy errors in many operators. You can repeat the quantization process with adjusted configurations until the accuracy requirements are met.
  6. Special data: As cosine similarity cannot fully measure the accuracy, in situations where the above method cannot solve the issue, you can perform Single-Operator Comparison.
3. (Optional) Compare the accuracy of a quantized offline model with fusion pattern enabled by default with that of a quantized offline model with fusion pattern disabled. For details, see "Accuracy Comparison Based on Model Conversion with Operator Fusion Enabled and Disabled" in Comparison Between NPU and NPU (Offline Inference).

In quantization scenarios, if you only need to preliminarily determine the accuracy issue arising in the process of converting a non-quantized original model (GPU/CPU) to a quantized offline model (NPU), prepare the input data listed in the following table.

**Table 2** Input data requirements for comparison between a non-quantized original model and a quantized offline model
File	Description	How to Obtain
.npy file of the non-quantized original model	Benchmark data	Preparing .npy Data of a Caffe Model
.json file generated by converting the offline model file using ATC Quantization information file (.json) after AMCT-based model compression Select one from the two.	Operator mapping obtaining	Preparation of Model Files and Quantization Information Files
Dump data file of the quantized offline model running on the Ascend AI Processor	Data to be compared	Preparing Dump Data of an Offline Model

Parent topic: Comparison Between GPU/CPU and NPU (Caffe Offline Inference)