Introduction

Application Scenario

To test the performance of Huawei Collective Communication Library (HCCL) in distributed training scenarios, you can use the HCCL Performance Tester.

Obtaining Source Package of the Tool

After the CANN Toolkit software package is installed, you can find the source code of the HCCL Performance Tester in ${INSTALL_DIR}/tools/hccl_test. Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.

Compilation is required before you use the tool.

Restrictions

Atlas Training Series Product: In the current version, the HCCL performance tester supports the performance test of a maximum of 4096 cards in a cluster.

Background Knowledge

  • Bandwidth for collective communication

    The collective communication bandwidth refers to the algorithm bandwidth, that is, the data volume/time consumed when a collective communication operation is performed.

    For example, if the AllReduce operation is performed on eight cards on a single server, the algorithm bandwidth of the AllReduce operator is the data volume divided by the time required for completing the AllReduce operation.

    When the HCCL performance tester is used for the test, the bandwidth data refers to the algorithm bandwidth.

    The algorithm bandwidth is affected by the following factors:
    • RDMA bandwidth between servers (RoCE link)
    • SDMA communication bandwidth between cards in a server (HCCS link)
    • PCIe link bandwidth
    • Implementation of communication algorithm orchestration
  • Physical bandwidth

    The physical bandwidth in a cluster includes the physical bandwidth of HCCS links and RoCE links. The physical bandwidth is a factor that affects the algorithm bandwidth.