AICORE Diagnosis
Function
Diagnose AICORE errors and output the diagnosis result.
Item |
Time Required |
Whether NPU Training or Inference Is Affected |
Application Scenario |
|---|---|---|---|
AICORE diagnosis |
3min~8min |
Yes |
During the inspection and rollout of a training or inference job, perform AI Core diagnosis for three times. If all the three rounds are passed, the diagnosis result is normal. If EMERGENCY_WARN is displayed in any round of the diagnosis, the processor is faulty, and the hardware needs to be replaced. |
- The AICORE stress test and AICORE diagnosis apply to different scenarios. For details, see Table 1. Perform the AICORE stress test and AICORE diagnosis as required.
- If you want to conduct the AICORE, full on-chip memory, and P2P stress tests at the same time, refer to One-Click Diagnosis.
Parameters
Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-i, --items] |
Specifies the diagnosis check item.
|
Yes |
[-sc, --sc, --stress-count] |
Specifies the number of AICORE diagnoses.
|
No |
Example
ascend-dmi -dg -i aicore -q
1 2 3 4 5 6 7 8 9 10 | [***@***]# ascend-dmi -dg -i aicore -q Stress test is being performed, please wait. Summary: Arch: aarch64 Mode: ****** Time: 20250529-19:35:34 Hardware: aicore: PASS |
Fault Check Items
Command Output |
Description |
|---|---|
PASS |
The diagnosis result is normal. |
SKIP |
The product or scenario does not support AICORE diagnosis. |
EMERGENCY_WARN |
Emergency warning. You are advised to replace the hardware. |
FAIL |
The AICORE diagnosis fails. Contact Huawei technical support or locate the fault by referring to FAQs. |