HcclGetRootInfo

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products / Atlas A2 inference products , only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products , only the Atlas 300I Duo inference card is supported.

Description

This API needs to be called only on the root rank before the HCCL initialization API HcclCommInitRootInfo or HcclCommInitRootInfoConfig is called, to generate the rank identification information (HcclRootInfo) of the root rank.

  • This API supports single-threaded loop calling. That is, you can call this API in a for loop by specifying different devices to obtain the rootInfo information of different devices within a thread.

    Assume that an AI server has eight devices, which are divided into four communicators. The two devices in each communicator communicates with each other, as shown in the following figure.

    Figure 1 Example of communicator division

    Figure 2 shows the process of obtaining rootInfo information and initializing collective communication. In a thread, create four pieces of rootInfo information by switching devices and store the information in an array with a length of 4. After the rootInfo information is obtained, start four threads to call the HcclCommInitRootInfo or HcclCommInitRootInfoConfig API (HcclCommInitRootInfo is used as an example in Figure 2) and initialize the communicator based on different rootInfo information.

    Figure 2 Single-threaded loop calling
  • (Optional) In the multi-server collective communication scenario, perform the following operations before calling HcclGetRootInfo:
    • Configure the environment variable HCCL_IF_IP or HCCL_SOCKET_IFNAME to specify the IP address of the root NIC for HCCL initialization. (HCCL_IF_IP takes precedence over HCCL_SOCKET_IFNAME. If neither of them is specified, the root NIC is selected in ascending lexicographical order of NIC names by default.)
    • Configure the environment variable HCCL_WHITELIST_DISABLE to enable trustlist verification and use HCCL_WHITELIST_FILE to specify the communication trustlist configuration file. (If this environment variable is not set, communication trustlist verification is disabled by default.)

Prototype

1
HcclResult HcclGetRootInfo(HcclRootInfo *rootInfo)

Parameters

Parameter

Input/Output

Description

rootInfo

Output

Rank identification information, including the device IP address and device ID. The information needs to be broadcast to all ranks in the cluster for HCCL initialization.

For details about the definition of the HcclRootInfo type, see HcclRootInfo.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
uint32_t rankSize = 8;
uint32_t deviceId = 0;
// Generate the identification information of the root rank.
HcclRootInfo rootInfo;
HcclGetRootInfo(&rootInfo);
// Initialize the communicator.
HcclComm hcclComm;
HcclCommInitRootInfo(rankSize, &rootInfo, deviceId, &hcclComm);
// Destroy the communicator.
HcclCommDestroy(hcclComm);