Comparison Operation and Analysis
Description
- The .json file and directory names in this section are only examples. Replace them with the actual ones. Ensure that the operation user has the read and write permissions on the result path specified by --out.
- If "MemoryError" is displayed during comparison, memory overflow/underflow occurs due to overloaded data. Split dump files on the NPU into different directories and compare the files one by one.
- If the size of the specified data file for comparison exceeds 1 GB or the size of the .json file exceeds 100 MB, the comparison may take a long time and the system displays the message "The size ( %d) of %s more than the XX, it needs more time to run."
Prerequisites
- Ensure that operations in Before You Start have been completed.
- Prepare the comparison file based on the scenario specified in Overview.
Procedure
This section describes how to compare the dump data of a non-quantized model running on the Ascend AI Processor and the .npy file of a non-quantized ONNX model. The following parameters are based on this example. You can replace them as required.
- Log in to the CANN tool installation environment.
- Go to the ${INSTALL_DIR}/tools/operator_cmp/compare directory. Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest..
- Run the comparison command.
There are multiple dump and .npy data files used for comparison. Therefore, the -m and -g options in the following steps must specify the parent directory of the data files, for example, $HOME/MyApp/resnet50. The resnet50 folder is used to store the comparison data files.
The directory structure is as follows:
1 2 3 4 5 6
root@xxx:$HOME/MyApp/resnet50# tree . ├── BatchMatMul.bert_encoder_layer_0_attention_self_MatMul_1.24.1614717261785536 ├── BatchMatMul.bert_encoder_layer_0_attention_self_MatMul.21.1614717261768864 ├── BatchMatMul.bert_encoder_layer_10_attention_self_MatMul_1.235.1614717263664916 # This is only an example. The remaining file names are omitted here.
python3 msaccucmp.py compare -m $HOME/MyApp/npu_dump/20230216155330/0/resnet50/1/0/ -g $HOME/MyApp/onnx_dump/ -f $HOME/module/out/onnx_resnet50.json -out $HOME/result -advisor
- The preceding command provides only examples of parameters required in the current scenario. For example, if the range of potential accuracy issues is known or the output file size of a large network model is too large, you can configure related parameters to reduce the output data volume. For details about more parameters, see Command Syntax.
- Install the dependency of pandas 1.3 or later. Otherwise, the -advisor option cannot be executed to output expert suggestions.
Table 1 Command-line options for network-wide comparison Option
Description
Mandatory (Yes/No)
-m
--my_dump_path
Directory for storing the data file generated during model running on the Ascend AI Processor.
Yes
-g
--golden_dump_path
Directory for storing the data file of the original network running on the GPU/CPU.
Yes
-f
--fusion_rule_file
Network-wide information file.
It is a .json file converted from the .om model file using ATC, which contains the mapping between network-wide operators and is used for operator matching during accuracy comparison.
No
-out
--output
Path of the comparison result. Defaults to the current path.
You are not advised to configure directories that are different from those of the current user to avoid privilege escalation risks.
No
-advisor
After tensor comparison is complete, analyzes the comparison result and provides expert suggestions. For details, see Advisor Suggestions on Comparison Results.
No
Figure 1 shows the comparison result.
Parameters in a Complete Model Comparison Result describes the fields in the comparison result.
- Analyze the comparison result.
For details, see Comparison Result Analysis. If the comparison fails or exceptions occur (for example, NaN in the results), see Comparison Result Description.
