Debugging Cases
Symptom
In the Zipformer speech model migration scenario, the first x frames are normal during streaming inference. However, an accuracy issue occurs in frame x+1. The environment configuration self-check shows no abnormalities, confirming that intermittent issues, hardware environment problems, and software version issues have been ruled out.
Environment:
- Hardware platform: Atlas 300I DUO inference server
- Software platform: Linux Ubuntu 4.15.0-29-generic, AArch64
- Environment version: CANN (7.5.0.1.129:8.0.RC3) 8.0.RC3 commercial edition
Analysis Procedure
- Run the following command to use ATC to convert the ONNX model:
atc --input_shape="x:4,77,40;cached_key_0:128,4,128" \ --precision_mode=force_fp32 \ --soc_version=Ascendxxx \ --framework=5 \ --output=modeloutputpath \ --model=/modelpath/model.onnx - Use the msit debug compare tool to obtain the comparison result.It is difficult to obtain intermediate inputs during streaming inference. Therefore, the input data is randomly created for debug compare. The command is as follows:
msit debug compare -gm ${modelpath}/model.onnx -om ${modelpath}/model.om -o ${output_path} --input-shape "x:4,77,40;cached_key_0:128,4,128"Output the comparison result and check the result file (Figure 1). The suspicious operator (mul_sub_sub) shows NaN and Inf values, likely due to overflow or underflow issues.
- Analyze the issue based on the comparison result.
Manually compare the dump result of the sub_6213 operator. Inf exists in the dump result of the OM data (the sub_6213 operator of the OM model is the fused operator mul_sub6211_sub6213 and overflow occurs). The ONNX output result is normal. Check the OM model and find that the output of sub_6200 is inf. This happens because the data is too large, causing overflow of the sub_6200 operator, as shown in Figure 2.
The fused operator does not involve the computation result of the intermediate operator, which makes it difficult to locate the issue. To resolve this, disable the fused operator and perform comparison again. For details about how to disable the fused operator, see Disabling Fusion Pattern Comparison. Figure 3 shows the comparison result after the fused operator is disabled.
As shown in Figure 3, the exp operator outputs inf in OM and ONNX. The input is normal and matched. The exp operator may have accuracy issues. Figure 4 shows the local topologies after fusion patterns are disabled.
The results in the figure are the current outputs of operators. Manual verification on local operators shows that one of the three inputs for the "where" operator in both ONNX and OM models includes inf. While the ONNX output remains normal, the OM output becomes inf. This causes the next operator's output to also be inf. Unlike the ONNX model, which manages overflow automatically, the OM model passes the overflowed result forward. It can be determined that the accuracy issues are caused by the overflow of the exp operator.
- Update the graph to include the clip operator for gradient clipping and avoid overflow. You are advised to use the msit debug surgeon tool to adjust the graph structure. For details about how to use the tool, see msit debug surgeon User Guide.



