HCCL_RDMA_PCIE_DIRECT_POST_NOSTRICT

Description

Submits the RDMA tasks in PCIe Direct mode in multi-server communication scenarios where the host OS uses non-4 KB memory pages and the communication operator delivery performance encounters the host bound. This helps improve the communication operator delivery performance.

Possible values are:
  • TRUE: The RDMA task is submitted in PCIe Direct mode (high-speed communication interface between the host and device).
  • FALSE (default): The RDMA task is submitted in host device communication (HDC) mode.

This environment variable takes effect only when the size of the small-page memory page table on the host is not 4 KB. If the size is 4 KB, RDMA tasks are submitted in PCIe Direct mode regardless of the value of this environment variable.

  • When this environment variable is set to TRUE, extra huge page memory on the device is occupied (each communication link occupies extra 1 MB huge page memory).
  • If you want to use this environment variable to improve the delivery performance of communication operators and reduce the huge page memory usage on the device, you can set the inter-server communication algorithm to ring using HCCL_ALGO to control the number of communication links.
    export HCCL_ALGO="level0:NA;level1:ring"

Example

export HCCL_RDMA_PCIE_DIRECT_POST_NOSTRICT=TRUE

Restrictions

When this environment variable is used, the scenario described in Description must be met:
  • Multi-server communication scenario.
  • Scenario where the size of the small page memory table managed by the host OS is not 4 KB.

Applicability

Atlas A2 training products/Atlas A2 inference products (For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.)