NPU Environment Restoration
Function
Reset the Ascend AI Processor through the standard PCIe hot reset process. The NPU environment restoration is required in the following scenarios:
- After the AICORE stress test and diagnosis are complete, the AICORE and bus voltages are abnormal.
- An NPU is disconnected during AICORE stress testing and diagnosis. That is, the NPU cannot be detected when you run the npu-smi info command to query basic device information. In this case, power off and restart the device and restore the NPU environment after device restart.
- An NPU is disconnected during AICPU stress testing. That is, the NPU cannot be detected when you run the npu-smi info command to query basic device information. In this case, power off and restart the device and restore the NPU environment after device restart.
Preparations
Before calling the NPU reset API, stop NPU-related services, which can be queried by fuser. For details, see Querying NPU Service Processes.
Parameters
You can run either of the following commands to view the parameters of the NPU restoration command:
ascend-dmi -r -h
ascend-dmi --reset --help
Table 1 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-r, --reset] |
Resets the NPU. |
Yes |
Note:
|
||
Example
ascend-dmi -r -d
1 2 3 | [***@***]# ascend-dmi -r -d 0,1,2 -q Status : PASS Message : Reset server successfully. |
Fault Check Items
Parameter |
Command Output |
Description |
|---|---|---|
status |
PASS |
The environment is restored successfully. |
SKIP |
The product or scenario does not support NPU environment restoration. |
|
FAIL |
Failed to restore the environment. The failure causes are as follows:
|
|
Message |
- |
Lists NPU environment restoration details. |