Computing Power Test
Function
Create a matrix multiplication A(m,k)*B(k,n) and execute it multiple times. Then, calculate the computing power of the AI Core in the entire card, server, or chip, and the real-time power in full computing power based on the computation amount and the time used for performing matrix multiplications. For
Operator Operation Type |
Parameter |
Description |
Value |
|---|---|---|---|
FP16 (all device models) BF16 ( INT8 ( |
m |
Rows in matrix A |
256 |
k |
Columns in matrix A and rows in matrix B |
32 |
|
n |
Columns in matrix B |
128 |
|
INT8 ( HF32 ( |
m |
Rows in matrix A |
256 |
k |
Columns in matrix A and rows in matrix B |
64 |
|
n |
Columns in matrix B |
128 |
|
FP32 ( |
m |
Rows in matrix A |
128 |
k |
Columns in matrix A and rows in matrix B |
32 |
|
n |
Columns in matrix B |
64 |
Settings Before the Test
- The Ascend AI Processor presets performance thresholds. You are advised to perform the computing power test when the device temperature is stable and lower than 90°C. This prevents frequency reduction from being triggered due to high device temperature, or the computing power test result is affected.
Parameters
You can run either of the following commands to list the parameters of the computing power test command:
ascend-dmi -f -h
ascend-dmi -f --help
Table 3 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-f, --flops] |
Measures the computing power of the entire card, chip, or server. |
Yes |
[-t, --type] |
Specifies the operator operation type, which can be fp16, fp32, hf32, bf16, and int8. If this parameter is not specified, fp16 is used by default. |
No |
[--all] |
If this parameter is specified, the computing power of the entire server is tested, that is, the sum of the computing power of all NPUs is calculated. This parameter cannot be used with -d. |
No |
[-et, --et, --execute-times] |
Specifies the number of times that matrix multiplication is performed on a single AI Core on a specified chip. If not specified, the default value 60 is used. The value ranges from 10 to 80, with the unit of 100,000. |
No |
Note:
|
||
Example
- Perform a computing power test on device 2 by executing a matrix multiplication 6 million times. The default operator operation type is fp16.
ascend-dmi -f -d 2 --et 60

- Perform a computing power test on device 2 by executing a matrix multiplication 6 million times. The default operator operation type is int8.
ascend-dmi -f -t int8 -d 2 --et 60

- Perform a computing power test on device 3 by executing a matrix multiplication 8 million times.

- Perform a computing power test on
Atlas A2 training products with the computing power test type set to hf32.ascend-dmi -f -t hf32

- Perform a computing power test on
Atlas A3 training products with the computing power test type set to fp32.ascend-dmi -f -t fp32 -q

- Perform a computing power test on
Atlas A3 training products with the computing power test type set to bf16.ascend-dmi -f -t bf16 -q

- Specify --all to test the computing power of the entire server (FP16 used as an example).
ascend-dmi -f -q --all
If --all is specified, Execute Times, Duration (ms), and Power (W) indicate the average values of other indicators of the entire server. Device is set to all, indicating all NPUs. TFLOPS@FP16 indicates the sum of the computing power of all devices.
------------------------------------------------------------------------ Device Execute Times Duration(ms) TFLOPS@FP16 Power(W) ------------------------------------------------------------------------ all 360000000 1702 2509.719 206.625015 ------------------------------------------------------------------------
Table 4 describes the server parameters in the preceding figures.
Parameter |
Description |
|---|---|
Device |
Indicates the device ID. |
Execute Times |
|
Duration(ms) |
Indicates the time used to complete the matrix multiplication computation. |
TFLOPS@FP16 |
Indicates the calculated computing power. FP16 is the specified operator running type. |
Power(W) |
Specifies real-time power in full computing power. NOTE:
You do not need to pay attention to the chip power during the computing power test because the power consumption data is collected periodically and there is an interval between two collections. When the computing power test period is too short, power consumption data fluctuates. Use a more specific power consumption test option to test the power consumption. |
To ensure the correctness and accuracy of the test result, perform the computing power test separately.