Performing a Power Consumption Test

Function

In the power consumption test, the single-operator model is run to detect the power consumption of the entire card or processor.

Precautions

To prevent frequent log output from affecting the test result, ensure that log levels on the host and device are set to ERROR before the test. The method is as follows:

  1. Check the log level.
    • Host: Run the echo $GLOBAL_LOG_LEVEL command. If the query result is invalid or empty, the log level is ERROR (corresponding to the value 3).
    • Device: Check the global log level, module log level, and whether the event log function is enabled in "Appendixes > msnpureport Instructions" in the CANN Log Reference.
  2. If the log level is not ERROR, set the log level on the host and device by referring to "Setting Log Levels" in the CANN Log Reference.

Commands for Querying Test Parameters

You can run either of the following commands to list the parameters of the power consumption test command:

ascend-dmi -p -h

ascend-dmi -p --help

Table 1 describes the parameters.

Table 1 Parameter description

Parameter

Description

Mandatory

[-p, --power]

Measures the power consumption of the entire card or processor.

Yes

[-t, --type]

Specifies the operator operation type, which can be fp16 or int8. If this parameter is not specified, fp16 is used by default.

No

[-dur, --dur, --duration]

Specifies the running time. If this parameter is not specified, the default value 600 is used.

The value ranges from 60 to 604800 in seconds.

No

[-it, --it, --interval-times]

Interval for refreshing screen information. If this parameter is not specified, the default value 5 is used.

The value ranges from 1 to 5 in seconds.

No

[-pm, --pm, --print-mode]

Specifies the print mode of the screen output. If this parameter is not specified, the default value refresh is used.

Value:

  • refresh: clears historical printing information each time.
  • history: displays the saved historical information.

No

  • The power consumption data is collected periodically, and there is an interval between two collections. Therefore, there is a low probability that the actual power consumption data is not collected, resulting in a low value displayed.
  • To ensure the correctness and accuracy of the test result, perform the power consumption test separately.
  • Considering the operation cost, the number of printing times in the power consumption test may not be the same as the theoretical value. For example, if the running time of the power consumption tool is 60s and the interval for refreshing the print information is 5s, the theoretical number of printing times is 12. However, the actual number of printing times is less than 12.
  • If multiple level-2 parameters such as --dur and --it are added behind ascend-dmi -p, you can specify the sequence of these parameters. This does not affect the command output. For example, the output of ascend-dmi -p --dur 60 --it 5 --pm refresh is the same as that of ascend-dmi -p --it 5 --dur 60 --pm refresh.
  • The int8 mode uses the integer operation. Compared with the floating-point arithmetic of the fp16 mode, some operation units are reduced. Therefore, the final power consumption value is relatively low. In addition, a performance threshold is preset for hardware devices. In fp16 mode, the threshold is easily to be reached, after which protection mechanisms, such as active frequency reduction and voltage adjustment will be triggered to prevent the chip power consumption from exceeding the threshold for a long time. In int8 mode, the power consumption is relatively low. If the threshold is not reached, the power consumption of different chips may differ significantly.

Example

The following are examples of power consumption output by each type of server.

  • Inference server (The Ascend 310 AI Processor is used as an example.)
    1. Use default parameters to conduct the power consumption test. (The operator operation type is fp16.)

      ascend-dmi -p

      Figure 1 Power consumption test example 1 (inference server)
    2. Set the operator operation type to int8. Retain the default values for other parameters.

      ascend-dmi -p -t int8

      Figure 2 Power consumption test example 2 (inference server)
  • Training server

    Perform a power consumption test with the execution time of 60s, information printing interval of 5s, and the screen output mode of being clearing historical records.

    ascend-dmi -p --dur 60 --it 5 --pm refresh

    Figure 3 Power consumption test example (training server)
  • Atlas 300T Pro training card (model 9000)

    Perform a power consumption test with the execution time of 60s, information printing interval of 5s, and the screen output mode of being clearing historical records.

    ascend-dmi -p --dur 60 --it 5 --pm refresh

    Figure 4 Power consumption test example (Atlas 300T training card (model Pro-9000))

Table 2 describes the server parameters in the preceding figures.

Table 2 Parameters

Parameter

Description

Product

Type

Indicates the standard card model.

Standard card

Card

Card ID

Chip

Indicates the processor number.

Name

Indicates the processor name.

Type

Indicates the processor model.

Training server

Chip Name

Indicates the processor name.

NPU Count

Indicates the number of NPUs.

Standard card and training server

Power

Indicates the actual power consumption of the processor or entire card.

Health

Indicates the processor health status.

Temperature

Indicates the current temperature of the processor.

Device ID

Indicates the processor device number.

AI Core Usage

Indicates the AI Core usage of the processor.

Voltage

Indicates the current voltage of the processor.