Overview

The ascend-dmi tool mainly provides functions such as the bandwidth test, computing power test, and power consumption test, for standard PCIe cards, board cards, and modules of Atlas products. Table 1 lists the specific functions. The system invokes the underlying device control and management interface (DCMI) and interfaces related to the Ascend Computing Language (AscendCL) to implement the tests. The system-level information is queried by invoking the common library provided by the system. You can configure parameters to implement different test functions when using the tool.

Table 1 Functions provided by the Ascend-DMI tool

Function

Description

Bandwidth test

Measures the bus bandwidth, memory bandwidth, and latency.

Computing power test

Measures the computing power of the AI Core in the entire card or processor and the real-time power in full computing power.

Power consumption test

Measures the power consumption of the processor or entire card.

Real-time device status query

Checks the status of the device in running.

Fault diagnosis

Diagnoses software and hardware faults and outputs diagnosis results. The check items are as follows:

  • Hardware: driver compatibility and health diagnosis, compatibility between CANN software at each layer and between CANN and drivers
  • Hardware: device health status, network health status, local bandwidth, and computing power

Software and hardware compatibility test

Checks the hardware and software compatibility based on the hardware information, architecture, driver version, firmware version, and software version obtained.

Driver and firmware compatibility test

Obtains the driver version of the current environment and firmware version of each Ascend AI Processor, and checks the compatibility diagnosis result between driver and firmware versions.

Device topology check

Displays the topology between multiple cards in a device.

  • If an error is reported when the preceding functions are used, an error code is generated in the corresponding log. To query the error code, visit aclError and DCMI API Return Codes.
  • When using the preceding functions, you are advised to perform the next step after the process is complete. You are not advised to terminate the process during the execution.