One-Click On-Chip Memory Stress Test
Function
Ascend DMI supports one-click on-chip memory stress test. That is, you can run commands only once to perform on-chip memory diagnosis, on-chip memory stress test, and on-chip memory high-risk address stress test and output the test results.
Item |
Time Required |
Whether NPU Training or Inference Is Affected |
Application Scenario |
|---|---|---|---|
One-click on-chip memory stress test |
< 1.5 hours |
Yes |
When a training or inference job is executed, an on-chip memory ECC occurs on the NPU, and a new isolation page is added. |
The on-chip memory stress test and on-chip memory diagnosis apply to different scenarios. For details, see Table 1. Perform the on-chip memory stress test or on-chip memory diagnosis as required.
- If you want to conduct the on-chip memory diagnosis, on-chip memory stress test, and on-chip memory high-risk address stress test at the same time, refer to One-Click On-Chip Memory Stress Test.
Parameters
Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-at, --at, --auto-test] |
Performs an automatic stress test. This parameter takes effect only when [-i, --items] contains hbm and -s is specified. |
Yes |
[-st, --st, --stress-time] |
Specifies the time required by the on-chip memory stress test. The command for combined stress tests additionally performs functions such as on-chip memory diagnosis and high-risk address stress test. Therefore, the actual execution time is longer than the specified time.
|
No |
Example
ascend-dmi -dg -i hbm -s --auto-test -q
1 2 3 4 5 6 7 8 9 10 | [***@***]# ascend-dmi -dg -i hbm -s --auto-test -q Stress test is being performed, please wait. Summary: Arch: aarch64 Mode: ****** Time: 20250529-19:08:50 Hardware: hbm: PASS |
Fault Check Items
Parameter |
Description |
|---|---|
PASS |
The one-click on-chip memory stress test is successful and no exception occurs. |
EMERGENCY_WARN |
|
SKIP |
The product or scenario does not support the one-click on-chip memory stress test. |
FAIL |
|