Locating AI Core Errors
You can analyze and locate AI Core errors based on the logs in the Run window.
********************Root cause conclusion*********************
# Gives the root cause if the error matches known error patterns.
***********************1. Basic information********************
# Gives the basic information about the device occurred with the AI Core error.
#kernel name: operator kernel name
#op address: address of the operator code in the DDR
#args address: address of the operator arguments in the DDR
error time : 2020-08-26-11:24:07
device id : 0
core id : 0
task id : 60
stream id : 517
node name : trans_TransData_167
kernel name : te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb
op address : 0x101000120000
args address : 0X101000053000
***********************2. AICERROR code***********************
# Gives the AI Core error code and description.
code : 0x10
CCU_ERR_INFO: 0xb166486200070074
ccu_err_addr bit[22:8]=000011100000000 meaning:CCU Error Address [17:3] approximate:0x3800
***********************3. Instructions************************
# Gives the error instructions.
start pc : 0x101000120000
current pc : 0x1010001201e0
Error occured most likely at line: 1d0
/{IDE path}/aicerror_xxxx/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.o.txt:1d0
{IDE path}/collection/compile/kernel_meta/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.cce:32 //CCE code line number of the error operator
/{Python script path}/nz_2_nd.py:4486 //Python code line number of the error operator
related instructions (error occured before the mark *):
1bc: <not available>
1c0: <not available>
1c4: <not available>
1c8: <not available>
1cc: <not available>
1d0: <not available>
1d4: <not available>
1d8: <not available>
1dc: <not available>
* 1e0: <not available>
For complete instructions, please see /{IDE path}/aicerror_xxxx/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.o.txt
****************4. Input and output of node*******************
# Gives the input and output information.
# The input and output addresses are parsed from the IMAS log of GE and the size is parsed from the build graph.
# In the case of memory zero copy, the new address (new addr) can also be parsed from the log.
# If the address is not within the range of the RTS allocation log, an overflow flag is added.
# If the device memory data is collected, NaN and INF verification will also be performed. The collected data is accurate only when the device is suspended.
# If the detected input count and output count are inconsistent with those defined in the kernel function, a WARNING is returned. There is a high probability of misplacement between the arguments provided by GE and those processed by the operator.
input[0] addr: 0x100801126600 size: 32288
output[0] addr: 0x100801157c00 size: 2048
***********************5. Op in graph*************************
# Gives information of the error operator.
# The operator information is taken from the build graph for viewing convenience.
***********************6. Dump info***************************
# Dump file of the error operator.
# This information is available only in the training scenario.
Parent topic: AI Core Error Analysis