EJ0001打屏报错

问题现象

plog日志中,TDT首报错为:[DeviceMsgProcess][tid:1241254] [TsdClient] DeviceMsgProc errcode[EJ0001]

[ERROR] TDT(685010,all_reduce_test):2023-11-29-11:55:41.334.702 [process_mode_manager.cpp:587][DeviceMsgProcess][tid:685010] [TsdClient] DeviceMsgProc  errcode[EJ0001]
[ERROR] TDT(685010,all_reduce_test):2023-11-29-11:55:41.334.873 [process_mode_manager.cpp:269][WaitRsp][tid:685010] tsd client wait response fail, device response code[1]. unknown device error.
[ERROR] TDT(685010,all_reduce_test):2023-11-29-11:55:41.334.893 [process_mode_manager.cpp:123][OpenProcess][tid:685010] Wait open response from device failed.
[ERROR] TDT(685010,all_reduce_test):2023-11-29-11:55:41.334.897 [tsd_client.cpp:31][TsdOpen][tid:685010] TsdOpen failed, deviceId[4].

EJ0001错误只能说明拉起device HCCP进程失败,具体失败原因需要根据device报错进一步区分,在debug目录下,使用/usr/local/Ascend/driver/tools/msnpureport -f导出device日志,grep -rn ERROR * | grep HCCP查看HCCP首报错,Device报错有以下两种场景。

原因分析与解决方法(Device报错“Create Server failed, ret(61)”)

若Device报错“Create Server failed, ret(61)”,可能有以下两种原因:

原因分析及解决方法(Device报错“certificate is not yet valid”)