Computing Power Test

Function

Create a matrix multiplication A(m,k)*B(k,n) and execute it multiple times. Then, calculate the computing power of the AI Core in the entire card, server, or chip, and the real-time power in full computing power based on the computation amount and the time used for performing matrix multiplications. For Atlas A2 training product, the computing power of the AI Core in the entire card, server, or chip, and real-time power in full computing power are calculated based on computation amount and the time required for performing matrix multiplication and vector multiplication for multiple times.

The involved parameters are shown in Table 1and Table 2.
Table 1 Matrix multiplication parameters

Operator Operation Type

Parameter

Description

Value

FP16 (all device models)

BF16 (Atlas A2 training product, Atlas A3 training product, Atlas 800I A2 inference server)

INT8 (Atlas 200/300/500 inference products)

m

Rows in matrix A

256

k

Columns in matrix A and rows in matrix B

32

n

Columns in matrix B

128

INT8 (Atlas 200I/500 A2 inference products, Atlas training products, Atlas A2 training products, Atlas A2 inference products, Atlas A3 training products, Atlas A3 inference products)

HF32 (Atlas A2 training products, Atlas A2 inference products, Atlas A3 training products, Atlas A3 inference products)

m

Rows in matrix A

256

k

Columns in matrix A and rows in matrix B

64

n

Columns in matrix B

128

FP32 (Atlas A2 training products, Atlas A2 inference products, Atlas A3 training products, Atlas A3 inference products)

m

Rows in matrix A

128

k

Columns in matrix A and rows in matrix B

32

n

Columns in matrix B

64

Table 2 Vector multiplication parameters

Operator Operation Type

Parameter

Description

Value

FP16

n

Vector length

32760

FP32

HF32

BF16

n

Vector length

16380

Settings Before the Test

  • The Ascend AI Processor presets performance thresholds. You are advised to perform the computing power test when the device temperature is stable and lower than 90°C. This prevents frequency reduction from being triggered due to high device temperature, or the computing power test result is affected.

Parameters

You can run either of the following commands to list the parameters of the computing power test command:

ascend-dmi -f -h

ascend-dmi -f --help

Table 3 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.

Table 3 Parameter description

Parameter

Description

Mandatory

[-f, --flops]

Measures the computing power of the entire card, chip, or server.

Yes

[-t, --type]

Specifies the operator operation type, which can be fp16, fp32, hf32, bf16, and int8. If this parameter is not specified, fp16 is used by default.

No

[--all]

If this parameter is specified, the computing power of the entire server is tested, that is, the sum of the computing power of all NPUs is calculated. This parameter cannot be used with -d.

No

[-et, --et, --execute-times]

Specifies the number of times that matrix multiplication is performed on a single AI Core on a specified chip. If not specified, the default value 60 is used. The value ranges from 10 to 80, with the unit of 100,000.

No

Note:

  • In this document, the input or output device ID is the logic chip ID.
  • You can run the npu-smi info -m command to obtain the logic ID by viewing the value of Chip Logic ID on the GUI. The NPU ID is the physical chip ID.

Example

  • Perform a computing power test on device 2 by executing a matrix multiplication 6 million times. The default operator operation type is fp16.

    ascend-dmi -f -d 2 --et 60

  • Perform a computing power test on device 2 by executing a matrix multiplication 6 million times. The default operator operation type is int8.

    ascend-dmi -f -t int8 -d 2 --et 60

  • Perform a computing power test on device 3 by executing a matrix multiplication 8 million times.

    ascend-dmi -f -d 3 --et 80

  • Perform a computing power test on Atlas A2 training products with the computing power test type set to hf32.

    ascend-dmi -f -t hf32

  • Perform a computing power test on Atlas A3 training products with the computing power test type set to fp32.

    ascend-dmi -f -t fp32 -q

  • Perform a computing power test on Atlas A3 training products with the computing power test type set to bf16.

    ascend-dmi -f -t bf16 -q

  • Specify --all to test the computing power of the entire server (FP16 used as an example).

    ascend-dmi -f -q --all

    If --all is specified, Execute Times, Duration (ms), and Power (W) indicate the average values of other indicators of the entire server. Device is set to all, indicating all NPUs. TFLOPS@FP16 indicates the sum of the computing power of all devices.

    ------------------------------------------------------------------------
      Device      Execute Times     Duration(ms)    TFLOPS@FP16     Power(W)
    ------------------------------------------------------------------------
      all         360000000         1702            2509.719      206.625015  
    ------------------------------------------------------------------------

Table 4 describes the server parameters in the preceding figures.

Table 4 Parameter description

Parameter

Description

Device

Indicates the device ID.

Execute Times

  • For the Atlas A2 training products and Atlas A2 inference productsAtlas A3 training products and Atlas A3 inference products, the value of Execute Times is the sum of the number of matrix multiplication times of a single AI Core multiplied by the number of AI Cores and the number of vector multiplication times of a single vector core multiplied by the number of vector cores.
  • For the Atlas training product and other inference products, the value of Execute Times is the number of matrix multiplication times of a single AI Core multiplied by the number of AI Cores.

Duration(ms)

Indicates the time used to complete the matrix multiplication computation.

TFLOPS@FP16

Indicates the calculated computing power. FP16 is the specified operator running type.

Power(W)

Specifies real-time power in full computing power.

NOTE:

You do not need to pay attention to the chip power during the computing power test because the power consumption data is collected periodically and there is an interval between two collections. When the computing power test period is too short, power consumption data fluctuates. Use a more specific power consumption test option to test the power consumption.

To ensure the correctness and accuracy of the test result, perform the computing power test separately.