如果info.txt中给出如下结论,说明是单算子运行报错或pass异常。
**********************Root cause conclusion****************** Single op test aicore error, please check op.
此时可以结合info.txt文件中的以下信息,通过单算子复现的方法进行问题处理。
***********************1. Basic information******************** error time : 2023-06-09-06:55:34.798.772 device id : 0 core id : 0 task id : 6 stream id : 7 node name : GatherV2 kernel name : te_gatherv2_657cb48fa1743a43209d7bc779fe8c294760a5b09b3079a3323fdf18376fc408_1
***********************2. AICERROR code*********************** error code : 0x10 error bits : CCU_ERR_INFO: 0x2c6290000324442 ccu_err_addr bit[22:8]=011001001000100 meaning:CCU Error Address [17:3] approximate:0x19220 ccu_illegal_instr,非法执行:1.指令的binary错误 2.指令地址非对齐
***********************3. Instructions************************ start pc : 0x1000124080064000 current pc : 0x124080067d2c instruction : Error occured most likely at line: 3d08 /home/HwHiAiUser/tf/info_20230609065654/aicerror_0_20230609065534/te_gatherv2_657cb48fa1743a43209d7bc779fe8c294760a5b09b3079a3323fdf18376fc408_1.o.txt:3d08 te_gatherv2_657cb48fa1743a43209d7bc779fe8c294760a5b09b3079a3323fdf18376fc408_1.cce:1364 /usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/gather_v2.py:1214 /usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/gather_v2.py:1214 related instructions (error occured before the mark *): 3d08: <not available> 3d0c: <not available> 3d10: <not available> 3d14: <not available> 3d18: <not available> 3d1c: <not available> 3d20: <not available> 3d24: <not available> 3d28: <not available> * 3d2c: <not available> For complete instructions, please view /home/liuzhenyu/tf/info_20230609065654/aicerror_0_20230609065534/te_gatherv2_657cb48fa1743a43209d7bc779fe8c294760a5b09b3079a3323fdf18376fc408_1.o.txt
出现该问题,可能由以下原因:
此类问题可以通过单算子复现的方法定位处理:
在执行msaicerr.py工具分析结束时,会生成一个单算子测试脚本(如下加粗字体所示),开发人员可执行该脚本复现AI Core error现象。
2023-06-09 06:56:58 (101494) - [INFO] The find single op log /home/HwHiAiUser/tf/single_op_log_20230609065654/debug/plog/plog-101494_20230609065657791.log 2023-06-09 06:56:58 (101494) - [INFO] Generate case file /home/HwHiAiUser/AicoreError/tools/msaicerr/test_single_op.py 2023-06-09 06:56:58 (101494) - [INFO] ################################################## 2023-06-09 06:56:58 (101494) - [INFO] single op test failed! Please Check OP or input data! 2023-06-09 06:56:58 (101494) - [INFO] Run 'python3 /home/HwHiAiUser/AicoreError/tools/msaicerr/test_single_op.py' can test op! 2023-06-09 06:56:58 (101494) - [INFO] ################################################## 2023-06-09 06:56:58 (101494) - [INFO] The ai core error info for No.0 is saved in /home/HwHiAiUser/tf/info_20230609065654/aicerror_0_20230609065534/info.txt 2023-06-09 06:56:58 (101494) - [INFO] Finish to analyze each ai core error. 2023-06-09 06:56:58 (101494) - [INFO] The summary info is saved in /home/HwHiAiUser/tf/info_20230609065654/README.txt 2023-06-09 06:56:58 (101494) - [INFO] Analysis finished, please check /home/HwHiAiUser/tf/info_20230609065654, you can view README.txt first.
该问题需要您收集msaicerr.py工具分析结果信息(info_时间戳 目录下所有文件),根据这些文件如果无法定位或解决问题,再通过https://gitee.com/ascend网站提交issue获取帮助。