atomic add精度溢出
分析结果
如果info.txt中给出如下结论,说明是精度溢出导致的AI Core error。
Analysis result: success. "**********************Root cause conclusion******************" "dha status 1" found in log. It means Atomic accumulation exception, please check the input data and network accuracy. Attention please, if multiple tasks are running on the same device at the same time, false positives may be generated. You are advised to pull up only one task and collect it .
故障根因
出现该问题,是由于在算子运算过程中,有极端的数据遇到了atomic累加指令,atomic累加时如果出现溢出,则会报0x800000错误。
Atlas A2训练系列产品上,如果出现该问题,则是系统误报,用户可忽略该问题。
父主题: 典型问题案例