Result Description

After the HCCL Performance Tester is executed, the following information is displayed.

Figure 1 Example of the execution result of the HCCL Performance Tester
The fields are described as follows:
  • data_size: the size of data used in collective communication on one NPU (unit: byte)
  • aveg_time: execution duration of the collective communication operator (unit: μs)
  • alg_bandwidth: execution bandwidth of the collective communication operator (unit: GB/s)

    Note: The execution bandwidth of the collective communication operator refers to the algorithm bandwidth, that is, Communication data size/Time consumed when a collective communication operation is performed.

  • check_result: flag of execution result check of the collective communication operator. Value options: success, failed, and NULL.
    • If -c is set to 0 (result check disabled) when the tool is executed, the value of check_result will be NULL.
    • When the operator computation result overflows or exceeds the accurate value range, result check is disabled and the value of check_result will be NULL.

      The HCCL Performance Tester initializes the operator input to a fixed value and checks whether the operator output meets the expectation to determine whether the communication result is correct. The value range and precision are limited. For the product and summation operations of reduction operators, the computation result may overflow or exceed the accurate value range due to a large number of NICs, leading to inaccurate check, and the value of check_result will be NULL. The following table lists the maximum number of NICs supported by the result check of product and summation operations for reduction operators in different operator types and data types.

      Operation Type

      Operator Type

      Data Type

      INT8

      INT16

      INT32

      INT64

      FP32

      FP16

      BF16

      Product (prod)

      AllReduce

      6

      14

      30

      62

      127

      15

      127

      Reduce

      ReduceScatter

      Summation (sum)

      AllReduce

      63

      16383

      ~1e9

      ~1e18

      ~1e6

      511

      63

      Reduce

      ReduceScatter

      11

      181

      46340

      ~1e9

      2896

      31

      11