AICORE Diagnosis

Function

Diagnose AICORE errors and output the diagnosis result.

Table 1 Diagnostic items

Item

Time Required

Whether NPU Training or Inference Is Affected

Application Scenario

AICORE diagnosis

3min~8min

Yes

During the inspection and rollout of a training or inference job, perform AI Core diagnosis for three times. If all the three rounds are passed, the diagnosis result is normal. If EMERGENCY_WARN is displayed in any round of the diagnosis, the processor is faulty, and the hardware needs to be replaced.

  • The AICORE stress test and AICORE diagnosis apply to different scenarios. For details, see Table 1. Perform the AICORE stress test and AICORE diagnosis as required.
  • If you want to conduct the AICORE, full on-chip memory, and P2P stress tests at the same time, refer to One-Click Diagnosis.

Parameters

Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 2 Parameter description

Parameter

Description

Mandatory

[-i, --items]

Specifies the diagnosis check item.
  • aicore: AICORE error diagnosis

Yes

[-sc, --sc, --stress-count]

Specifies the number of AICORE diagnoses.

  • This parameter takes effect only when items is set to aicore. If this parameter is not specified, the default value 1 is used. The value range is [1, 100].

No

Example

ascend-dmi -dg -i aicore -q

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[***@***]# ascend-dmi -dg -i aicore  -q
Stress test is being performed, please wait.
Summary:
    Arch: aarch64
    Mode: ******
    Time: 20250529-19:35:34
 
Hardware:
    aicore:
        PASS

Fault Check Items

Table 3 Fault check items

Command Output

Description

PASS

The diagnosis result is normal.

SKIP

The product or scenario does not support AICORE diagnosis.

EMERGENCY_WARN

Emergency warning. You are advised to replace the hardware.

FAIL

The AICORE diagnosis fails. Contact Huawei technical support or locate the fault by referring to FAQs.