On-Chip Memory High-Risk Address Stress Test

Function

Perform a stress test on the on-chip memory high-risk address and output the diagnosis result.

Table 1 Diagnostic items

Item

Time Required

Whether NPU Training or Inference Is Affected

Application Scenario

On-chip memory high-risk address stress test

≤ 22 minutes

Yes

A single-bit or multi-bit error occurs in the on-chip memory diagnosis result.

  • The on-chip memory stress test and on-chip memory diagnosis apply to different scenarios. For details, see Table 1. Perform the on-chip memory stress test or on-chip memory diagnosis as required.
  • If you want to conduct the on-chip memory diagnosis, on-chip memory stress test, and on-chip memory high-risk address stress test at the same time, refer to One-Click On-Chip Memory Stress Test.

Parameters

Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 2 Parameters

Parameter

Description

Mandatory

[-i, --items]

Specifies the diagnosis check item.

  • Currently, the check item can be hbm or chipMemory. However, they cannot be specified at the same time.

Yes

[-qs, --qs, --quick-stress]

Specifies the range for a fast stress test on high-risk addresses of on-chip memory.

  • The value range of this parameter is 0 to 100. The recommended value is 100.
  • If the value is 0, a fast stress test is performed on all on-chip memory addresses by default.
  • This parameter must be used together with [-s, --stress] when hbm is included. It cannot be used together with [-st, --st, --stress-time] or [--sc, --stress-count].

Yes

Example

Example of setting the stress test range to 100:

ascend-dmi -dg -i hbm -s -qs 100 -q

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[***@***]# ascend-dmi -dg -i hbm -s -qs 100 -q
Stress test is being performed, please wait.
Summary:
    Arch: aarch64
    Mode: ******
    Time: 20250529-19:37:16
 
Hardware:
    hbm:
        PASS

Fault Check Items

Table 3 Fault check items

Command Output

Description

PASS

The on-chip memory high-risk address stress test is passed, and no new isolated pages are added.

SKIP

The product or scenario does not support the on-chip memory high-risk address stress test.

FAIL

The on-chip memory high-risk address stress test fails, and new isolated pages are added.