On-Chip Memory Stress Test
Function
Perform a stress test on the on-chip memory and output the diagnosis result.
Item |
Diagnosis Duration (Atlas Inference Products) |
Diagnosis Duration (Other Products) |
Whether NPU Training or Inference Is Affected |
Application Scenario |
|---|---|---|---|---|
On-chip memory stress test |
6–7 hours |
< 1 hour |
Yes |
The stress test is performed before a training or inference job is rolled out, or an ECC on the NPU on-chip memory is detected during job execution. |
- The on-chip memory stress test and on-chip memory diagnosis apply to different scenarios. For details, see Table 1. Perform the on-chip memory stress test or on-chip memory diagnosis as required.
- If you want to conduct the on-chip memory diagnosis, on-chip memory stress test, and on-chip memory high-risk address stress test at the same time, refer to One-Click On-Chip Memory Stress Test.
Parameters
Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-i, --items] |
Specifies the diagnosis check item.
|
Yes |
[-st, --st, --stress-time] |
Specifies the time required by the on-chip memory stress test.
|
No |
Example
- Example of hbm test on the Atlas 800I A2 inference server with the test duration setting to 60s:
ascend-dmi -dg -i hbm -s -st 60 -q
1 2 3 4 5 6 7 8 9 10
[***@***]# ascend-dmi -dg -i hbm -s -st 60 -q Stress test is being performed, please wait. Summary: Arch: aarch64 Mode: ****** Time: 20250529-19:36:47 Hardware: hbm: PASS
- Example of chipMemory test on the Atlas 300I Duo inference card with the test duration setting to 60s:
ascend-dmi -dg -i chipMemory -s -st 60 -q
1 2 3 4 5 6 7 8 9 10
[***@***]# ascend-dmi -dg -i chipMemory -s -st 60 -q Stress test is being performed, please wait. Summary: Arch: aarch64 Mode: ****** Time: 20250529-19:25:25 Hardware: chipMemory: PASS
Fault Check Items
Command Output |
Description |
|---|---|
PASS |
The on-chip memory stress test is passed. |
SKIP |
The product or scenario does not support the on-chip memory stress test. |
FAIL |
Refer to On-Chip Memory Stress Test Fails Due to Insufficient Device Memory. |
