AICORE Stress Test

Function

Perform a stress test on AICORE errors and output the diagnosis result.

Table 1 Diagnostic items

Item

Time Required

Whether NPU Training or Inference Is Affected

Application Scenario

AICORE stress test

9–24 minutes

Yes

An AICORE error occurs when a training or inference job is executed.

  • The AICORE stress test and AICORE diagnosis apply to different scenarios. For details, see Table 1. Perform the AICORE stress test and AICORE diagnosis as required.
  • If you want to conduct the AICORE, full on-chip memory, and P2P stress tests at the same time, refer to One-Click Diagnosis.

Parameters

Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 2 Parameter description

Parameter

Description

Mandatory

[-i, --items]

Specifies the diagnosis check item.
  • aicore: AI Core error stress test

Yes

[-s, --stress]

Performs a stress test.

Yes

[-sc, --sc, --stress-count]

Specifies the number of AICORE stress tests.

  • This parameter takes effect only when items is set to aicore. The value range is [1, 100].

No

Example

Example of setting the number of stress tests to 3:

ascend-dmi -dg -i aicore -s -sc 3 -q

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[***@***]# ascend-dmi -dg -i aicore -s -sc 3 -q
Stress test is being performed, please wait.
Summary:
    Arch: aarch64
    Mode: ******
    Time: 20250529-19:51:09
 
Hardware:
    aicore:
        PASS

Fault Check Items

Table 3 Fault check items

Command Output

Description

PASS

The stress test result is normal.

SKIP

The product or scenario does not support the AICORE stress test.

EMERGENCY_WARN

Emergency warning. Replace the hardware.

FAIL

The AICORE stress test fails. Contact Huawei technical support.