Result Description

After the HCCL Performance Tester is executed, the following information is displayed:

Figure 1 Example of the execution result of the HCCL Performance Tester
The fields are described as follows:
  • data_size: the size of data used in collective communication on one NPU (unit: byte)
  • aveg_time: execution duration of the collective communication operator (unit: μs)
  • alg_bandwidth: execution bandwidth of the collective communication operator (unit: GB/s)

    Note: The execution bandwidth of the collective communication operator refers to the algorithm bandwidth, that is, Communication data size/Time consumed when a collective communication operation is performed.

  • check_result: flag of execution result check of the collective communication operator. Value options: success, failed, and NULL.
    • If -c is set to 0 (result check disabled) when the tool is executed, the value of check_result will be NULL.
    • When the operator computation result overflows or exceeds the accurate value range, result check is disabled and the value of check_result will be NULL.
      The HCCL Performance Tester initializes the operator input to a fixed value and checks whether the operator output meets the expectation to determine whether the communication result is correct. The value range and precision are limited. For some operations, the computation result may overflow or exceed the accurate value range due to a large number of NICs, leading to inaccurate check of the HCCL Performance Tester. In this case, the value of check_result will be NULL.
      • The following table lists the maximum number of NICs supported by the result check of product and summation operations for reduction operators in different operator types and data types.

        Operation Type

        Operator Type

        Data Type

        INT8

        INT16

        INT32

        INT64

        FP32

        FP16

        BF16

        Multiplication (Prod)

        AllReduce

        6

        14

        30

        62

        127

        15

        127

        Reduce

        ReduceScatter

        Summation (Sum)

        AllReduce

        63

        16383

        ~1e9

        ~1e18

        ~1e6

        511

        63

        Reduce

        ReduceScatter

        11

        181

        46340

        ~1e9

        2896

        31

        11

        ReduceScatterV

        11

        181

        46340

        ~1e9

        2896

        31

        11

      • If the data type is int8 or uint8, the AllGather, AllGatherV, AlltoAll, AlltoAllV, AlltoAllVC, and Scatter operators support a maximum of 127 NICs.