Result Description
After the HCCL Performance Tester is executed, the following information is displayed.
- data_size: the size of data used in collective communication on one NPU (unit: byte)
- aveg_time: execution duration of the collective communication operator (unit: μs)
- alg_bandwidth: execution bandwidth of the collective communication operator (unit: GB/s)
Note: The execution bandwidth of the collective communication operator refers to the algorithm bandwidth, that is, Communication data size/Time consumed when a collective communication operation is performed.
- check_result: flag of execution result check of the collective communication operator. Value options: success, failed, and NULL.
- If -c is set to 0 (result check disabled) when the tool is executed, the value of check_result will be NULL.
- When the operator computation result overflows or exceeds the accurate value range, result check is disabled and the value of check_result will be NULL.
The HCCL Performance Tester initializes the operator input to a fixed value and checks whether the operator output meets the expectation to determine whether the communication result is correct. The value range and precision are limited. For the product and summation operations of reduction operators, the computation result may overflow or exceed the accurate value range due to a large number of NICs, leading to inaccurate check, and the value of check_result will be NULL. The following table lists the maximum number of NICs supported by the result check of product and summation operations for reduction operators in different operator types and data types.
Operation Type
Operator Type
Data Type
INT8
INT16
INT32
INT64
FP32
FP16
BF16
Product (prod)
AllReduce
6
14
30
62
127
15
127
Reduce
ReduceScatter
Summation (sum)
AllReduce
63
16383
~1e9
~1e18
~1e6
511
63
Reduce
ReduceScatter
11
181
46340
~1e9
2896
31
11