Power Consumption Test

Function

In the power consumption test, the single-operator model is run to detect the power consumption of the entire NPU.

Parameters

You can run either of the following commands to list the parameters of the power consumption test command:

ascend-dmi -p -h

ascend-dmi -p --help

Table 1 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.

Table 1 Parameter description

Parameter

Description

Mandatory

[-p, --power]

Measures the power consumption of the entire NPU.

Yes

[-t, --type]

Specifies the operator operation type, which can be fp16 or int8. If this parameter is not specified, fp16 is used by default.

The Atlas A2 training product, A200I A2 Box heterogeneous component, Atlas A3 training product, and Atlas A3 inference products support only fp16 is supported only by the Atlas A3 training product, Atlas A3 inference products, Atlas A2 training product and A200I A2 Box heterogeneous component.

No

[-pt, --pt, --pressure-type]

Specifies the type of the stress testing.

  • Currently, the following types are supported:
    • edp (Estimated Design Power)
    • tdp (Thermal Design Power)
  • It can be used together with --dur, --it, --pm, and -q.
  • It cannot be used together with -t.
  • If this parameter is not specified, the TDP is tested by default.
  • This parameter is available only on the Atlas A3 training product, Atlas A3 inference products Atlas A2 training product, Atlas 800I A2 inference product, and A200I A2 Box heterogeneous component.

No

[-dur, --dur, --duration]

Specifies the running time. If this parameter is not specified, the default value 600 is used.

The value ranges from 60 to 604800, in seconds.

No

[-it, --it, --interval-times]

Specifies the interval for refreshing screen information. If this parameter is not specified, the default value 5 is used.

The value ranges from 1 to 5, in seconds.

No

[--skip-check]

Skips the device health check.

If this parameter is not passed, the device health status is checked by default.

No

[-pm, --pm, --print-mode]

Specifies the print mode of the screen output. If this parameter is not specified, the default value refresh is used.

Value:

  • refresh: clears historical printing information each time.
  • history: displays the saved historical information.
    NOTE:

    In refresh mode, when there are a large number of chips, you are advised to decrease the font size so that all results can be displayed on the same screen. Otherwise, the display may be abnormal and some content may be printed repeatedly.

No

Example

The following are examples of power consumption output by each type of server.

  • Inference server
    1. Use default parameters to conduct the power consumption test. (The operator operation type is fp16.)

      ascend-dmi -p

      Figure 1 Power consumption test example 1 (inference server)
    2. Set the operator operation type to int8. Retain the default values for other parameters.

      ascend-dmi -p -t int8

      Figure 2 Power consumption test example 2 (inference server)
  • Training server

    Perform a power consumption test with the execution time of 60s, information printing interval of 5s, and the screen output mode of being clearing historical records.

    ascend-dmi -p --dur 60 --it 5 --pm refresh

    Figure 3 Power consumption test example (training server)
  • Power consumption test of the edp type
    ascend-dmi -p -pt edp -q
    Command output:
    |=======================+==================+=======================|
    | Type                  | NPU Count                                |
    +-----------------------+------------------+-----------------------+
    | Device ID             | Health           | Temperature   Voltage |
    | Chip Name             | AI Core Usage    | Power        Frequency|
    |=======================+==================+=======================|
    | Ascend ***            | 8                                        |
    +-----------------------+------------------+-----------------------+
    | 0                     | OK               | 49C           0.79V   |
    | Ascend ***            | 100%             | 350.1W        1500MHZ |
    +-----------------------+------------------+-----------------------+
    | 1                     | OK               | 55C           0.79V   |
    | Ascend ***            | 100%             | 350.4W        1550MHZ |
    +-----------------------+------------------+-----------------------+
    | 2                     | OK               | 50C           0.78V   |
    | Ascend ***            | 100%             | 349.9W        1600MHZ |
    +-----------------------+------------------+-----------------------+
    | 3                     | OK               | 55C           0.78V   |
    | Ascend ***            | 100%             | 350.0W        1550MHZ |
    +-----------------------+------------------+-----------------------+
    | 4                     | OK               | 49C           0.77V   |
    | Ascend ***            | 100%             | 350.2W        1500MHZ |
    +-----------------------+------------------+-----------------------+
    | 5                     | OK               | 54C           0.77V   |
    | Ascend ***            | 100%             | 350.1W        1500MHZ |
    +-----------------------+------------------+-----------------------+
    | 6                     | OK               | 49C           0.78V   |
    | Ascend ***            | 100%             | 349.8W        1550MHZ |
    +-----------------------+------------------+-----------------------+
    | 7                     | OK               | 53C           0.75V   |
    | Ascend ***            | 100%             | 350.2W        1600MHZ |
    |=======================+==================+=======================|

Table 2 describes the server parameters in the preceding figures.

Table 2 Parameters

Parameter

Description

Product

Type

Standard card model.

Standard card

Card

Card ID

Chip

Chip ID

Name

Chip name

Type

Chip model

Training server

Chip Name

Chip name

NPU Count

Number of NPUs.

Standard card and training server

Power

Actual power consumption of the entire NPU or device

Health

Chip health status

Temperature

Chip temperature

Device ID

Logic device ID

AI Core Usage

AI Core usage of the chip

Voltage

Current voltage of the chip

Frequency

Current frequency of the chip