Querying Real-Time Device Status
Function
Check the running status of the device in real time.
Commands for Querying Test Parameters
You can run either of the following commands to list the parameters of the command for querying the real-time device status:
ascend-dmi -i -h
ascend-dmi -i --help
Table 1 describes the parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-i, --info] |
Displays the real-time status of a device. |
Yes |
[-b, --brief] |
Displays basic information about a processor. |
No |
[-dt, --dt, --detail] |
Displays detailed information about a processor. |
No |
Leave --dt and -b unspecified |
Displays basic information about the processor by default. |
No |
[-fmt, --fmt, --format] |
Specifies the output format. The value can be normal or json. If this parameter is not specified, the default value normal is used. |
No |
Example
- Query detailed information about a processor.
The following are examples of the queried processor details returned by each type of server. If the corresponding information is returned, the tool is running properly.
- Inference server (The Ascend 310 AI Processor is used as an example.)
Figure 1 Example of querying real-time device status (inference server)
- Training server
Figure 2 Example of querying real-time device status (training server)
- Atlas 300T training card (model 9000)
Figure 3 Example of querying real-time device status (Atlas 300T training card (model 9000))
- Atlas 200 AI accelerator module
Figure 4 Example of querying real-time device status (Atlas 200 AI accelerator module (RC))
Figure 5 Example of querying real-time device status (Atlas 200 AI accelerator module (EP))
Table 2 describes the server parameters in the preceding figures.
Table 2 Parameters Parameter
Description
Product
Type
Indicates the processor model.
Training server
NPU Count
Indicates the number of NPUs.
Card Quantity
Indicates the number of cards.
Standard card
Type
Indicates the standard card model.
Card Manufacturer
Indicates the card manufacturer.
Card Serial Number
Indicates the serial number of the card.
Card ID
Indicates the ID of the card.
Real-time Card Power (W)
Indicates the real-time power consumption in W.
Device Count
Indicates the number of devices (NPUs).
Chip Name
Indicates the processor name.
Standard card and training server
Device ID
Indicates the ID of the device.
Chip ID
Indicates the processor ID.
DIE ID
Indicates the DIE ID of a processor.
AI Core Information
Displays the AI Core information, which includes the following:
- AI Core Count: number of AI Cores
- AI Core Usage (%): AI Core usage
- Cube Count: number of cubes
- Vector Count: number of vectors
CPU Information
Displays the CPU information, which includes the following:
- AI CPU Count: number of AI CPUs
- AI CPU Usage (%): AI CPU usage
- Control CPU Count: number of Ctrl CPUs
- Control CPU Usage (%): Ctrl CPU usage
- Control CPU Frequency (MHz): frequency of the Ctrl CPU
Memory Information
Displays the memory information, which includes the following:
- Total (MB): total memory capacity in MB
- Used (MB): used memory
- Bandwidth Usage (%): memory bandwidth usage
- Frequency (MHz): memory frequency in MHz
Power Information
Displays the power consumption information, which includes the following:
Real-time Power (W): real-time power consumption (available only when the command is executed on a training server)
Temperature (°C)
Indicates the processor temperature.
PCIe Information
Displays the PCIe information, which includes the following:
- Domain: PCIe domain
- Bus: PCIe bus number
- Device: PCIe device ID
- Bus ID: PCIe bus address
- Subvendor ID: subsystem vendor ID
- Subdevice ID: subdevice ID
- LnkCap Speed: maximum link speed
- LnkCap Width: maximum link bandwidth
- LnkSta Speed: current speed of the link
- LnkSta Width: current bandwidth of the link
- CPU Affinity: CPU affinity
Error Information
Displays error information.
Error Count
Indicates the number of errors.
ECC Information
Displays ECC information.
DDR
Memory type of the card. The options are as follows:
- DDR
- SRAM
- HBM
- NPU
The following information is also contained:
- Single-Bit Error Count: number of single-bit errors
- Double-Bit Error Count: number of double-bit errors
When you run the ascend-dmi -i --dt command, the following situations may occur:
- If you run this command as a non-root user, "<Access denied. Please switch to root and try again.>" is displayed for some check items. To obtain the information, switch to the root user and run the command again.
- If you run this command in a container, "Unknown" is displayed for some check items. To obtain the information, exit the container and run the command again.
- Inference server (The Ascend 310 AI Processor is used as an example.)
- Query basic information about a processor.
The following are examples of basic information about the queried processor returned by each type of server. If the corresponding information is returned, the tool is running properly.
- Inference server (The Ascend 310 AI Processor is used as an example.)
Figure 6 Example of querying real-time device status (inference server)
- Training server
Figure 7 Example of querying real-time device status (training server)
- Atlas 300T Pro training card (model 9000)
Figure 8 Example of querying real-time device status (Atlas 300T Pro training card (model 9000))
- Atlas 200 AI accelerator module
Figure 9 Example of querying real-time device status (Atlas 200 AI accelerator module)
Table 3 describes the server parameters in the preceding figures.
Table 3 Parameter description Parameter
Description
Product
Type
Indicates the standard card model.
Standard card
Card
Card ID
NPU Count
Indicates the number of NPUs.
Real-time Card Power
Indicates the actual power consumption of the card.
Chip
Indicates the processor number.
Name
Indicates the processor name.
Type
Indicates the processor model.
Training server
NPU Count
Indicates the number of NPUs.
Chip Name
Indicates the processor name.
Power
Indicates the power consumption.
Health
Indicates the processor health status.
Standard card and training server
Used Memory
Indicates the memory used.
Temperature
Indicates the current temperature of the processor.
Voltage
Indicates the current voltage of the processor.
Device ID
Indicates the processor device number.
Bus ID
PCIe bus address
AI Core Usage
Indicates the AI Core usage of the processor.
- Inference server (The Ascend 310 AI Processor is used as an example.)