Performing a Computing Power Test

Function

Create a matrix multiplication A(m,k)*B(k,n) and execute it multiple times. Then, calculate the computing power of the AI Core in the entire card or processor and the real-time power of processor in full computing power based on the computation amount and the time used for performing matrix multiplications.

The designed matrix multiplication parameters are shown in Table 1.
Table 1 Matrix multiplication parameters

Operator Operation Type

Parameter

Description

Value

fp 16 (inference and training servers)

int8 (inference servers)

m

Rows in matrix A

256

k

Columns in matrix A and rows in matrix B

32

n

Columns in matrix B

128

int8 (training servers)

m

Rows in matrix A

256

k

Columns in matrix A and rows in matrix B

64

n

Columns in matrix B

128

Precautions

To prevent frequent log output from affecting the test result, ensure that log levels on the host and device are set to ERROR before the test. The method is as follows:

  1. Check the log level.
    • Host: Run the echo $GLOBAL_LOG_LEVEL command. If the query result is invalid or empty, the log level is ERROR (corresponding to the value 3).
    • Device: Check the global log level, module log level, and whether the event log function is enabled in "Appendixes > msnpureport Instructions" in the CANN Log Reference.
  2. If the log level is not ERROR, set the log level on the host and device by referring to "Setting Log Levels" in the CANN Log Reference.

Commands for Querying Test Parameters

You can run either of the following commands to list the parameters of the computing power test command:

ascend-dmi -f -h

ascend-dmi -f --help

Table 2 describes the parameters.

Table 2 Parameter description

Parameter

Description

Mandatory

[-f, --flops]

Measures the computing power of the entire card or processor.

Yes

[-t, --type]

Specifies the operator operation type, which can be fp16 or int8. If this parameter is not specified, fp16 is used by default.

No

[-d, --device]

Specifies the ID of the device whose computing power is to be tested. The device ID is the ID of the Ascend AI Processor. You can run the ascend-dmi --info command to obtain the number of processors from the Chip parameter displayed. For example, if an Atlas 300I inference card is configured with four Ascend AI Processors, the value of Device ID ranges from 0 to 3. If the device ID is not specified, the computing power information of device 0 is returned by default.

  • Training scenario: Test the computing power of the processor corresponding to the device ID.
  • Inference scenario: Test the computing power of the entire card with the device corresponding to the device ID.

No

[-et, --et, --execute-times]

Specifies the number of times that matrix multiplication is performed on a single AI Core on a specified processor.

  • Training Scenario: If the number of execution times is left blank, the default value 60 is used. In the training scenario, the unit is 100,000, and the value range is 10 to 80.
  • Inference Scenario: If the number of execution times is left blank, the default value 10 is used. In the inference scenario, the unit is million, and the value range is 10 to 80.

No

[-fmt, --fmt, --format]

Specifies the output format. The value can be normal or json. If this parameter is not specified, the default value normal is used.

No

  • Assuming the same number of matrix multiplications are performed: in an inference card, the computing power in int8 mode is doubled and the execution time is halved compared with that in fp16 mode. However, in a training card, the int8 mode doubles the size of a single matrix multiplication to fill up the processor data. As a result, the execution time is the same as that of the fp16 mode, but the computing power is still doubled.
  • If you need to perform the computing power test for a long time, see Computing Power Test Script for Cyclical Calling. If you also need to collect the output of the ascend-dmi -i command when the AI Core usage is 100% during the computing power test, see Script for Querying Real-time Device Status.
  • If multiple level-2 parameters such as -d and --et are added behind ascend-dmi -f, you can specify the sequence of these parameters. This does not affect the command output. For example, the output of ascend-dmi -f -d 2 --et 60 is the same as that of ascend-dmi -f --et 60 -d 2.
  • The int8 mode uses the integer operation. Compared with the floating-point arithmetic of the fp16 mode, some operation units are reduced. Therefore, the final power consumption value is relatively low.

Example

  • Perform a computing power test on device 2 (an inference server) by executing a matrix multiplication 60 million times. The default operator operation type is fp16.

    ascend-dmi -f -d 2 --et 60

    If information shown in Figure 1 is displayed, the tool is running properly.

    Figure 1 Example 1 of the inference server computing power test
  • Perform a computing power test on device 2 (an inference server) by executing a matrix multiplication 60 million times. The operator operation type is int8.

    ascend-dmi -f -t int8 -d 2 --et 60

    If information shown in Figure 2 is displayed, the tool is running properly.

    Figure 2 Example 2 of the inference server computing power test
  • Perform a computing power test on device 3 (a training server) by executing a matrix multiplication 8 million times.

    ascend-dmi -f -d 3 --et 80

    If information shown in Figure 3 is displayed, the tool is running properly.

    Figure 3 Example of the training server computing power test

Table 3 describes the server parameters in the preceding figures.

Table 3 Parameter description

Parameter

Description

Device

Indicates the device ID.

Execute Times

Indicates the number of times that matrix multiplication is performed in the actual operation.

  • Training scenario: For example, when processors in the training scenario have 32 AI Cores, Execute Times is obtained by multiplying the number of times that matrix multiplication is performed by the number of AI Cores.
  • Inference scenario: For example, when processors in the inference scenario have 8 AI Cores, Execute Times is obtained by multiplying the number of times that matrix multiplication is performed by the number of AI Cores and then by the number of processors.

Duration(ms)

Indicates the time used to complete the matrix multiplication computation.

TFLOPS@FP16

Indicates the computing power of the processor when tested using the FP16 data.

Power(W)

Indicates the real-time power of the processor in full computing power.

NOTE:

You do not need to pay attention to the processor power during the computing power test because the power consumption data is collected periodically and there is an interval between two collections. When the computing power test period is too short, power consumption data fluctuates. Use a more specific power consumption test option to test the power consumption.

  • To ensure the correctness and accuracy of the test result, perform the computing power test separately.
  • To test the physical computing power of the entire cluster, use the cluster computing power test tool. If you do not have the permission to access the tool, contact technical support to join the ascend-toolbox group within the Ascend organization.