HcclGetRootInfo
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
☓ |
|
|
√ |
|
|
√ |
For
For the
Description
This API needs to be called only on the root rank before the HCCL initialization API HcclCommInitRootInfo or HcclCommInitRootInfoConfig is called, to generate the rank identification information (HcclRootInfo) of the root rank.
- This API must be used in conjunction with the initialization API HcclCommInitRootInfo or HcclCommInitRootInfoConfig.
- This API supports single-threaded loop calling. That is, you can call this API in a for loop by specifying different devices to obtain the rootInfo information of different devices within a thread.
Assume that an AI server has eight devices, which are divided into four communicators. The two devices in each communicator communicates with each other, as shown in the following figure.
Figure 1 Example of communicator division
Figure 2 shows the process of obtaining rootInfo information and initializing collective communication. In a thread, create four pieces of rootInfo information by switching devices and store the information in an array with a length of 4. After the rootInfo information is obtained, start four threads to call the HcclCommInitRootInfo or HcclCommInitRootInfoConfig API (HcclCommInitRootInfo is used as an example in Figure 2) and initialize the communicator based on different rootInfo information.
- (Optional) In the multi-server collective communication scenario, perform the following operations before calling HcclGetRootInfo:
- Configure the environment variable HCCL_IF_IP or HCCL_SOCKET_IFNAME to specify the IP address of the root NIC for HCCL initialization. (HCCL_IF_IP takes precedence over HCCL_SOCKET_IFNAME. If neither of them is specified, the root NIC is selected in ascending lexicographical order of NIC names by default.)
- Configure the environment variable HCCL_WHITELIST_DISABLE to enable trustlist verification and use HCCL_WHITELIST_FILE to specify the communication trustlist configuration file. (If this environment variable is not set, communication trustlist verification is disabled by default.)
Prototype
1
|
HcclResult HcclGetRootInfo(HcclRootInfo *rootInfo) |
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
rootInfo |
Output |
Rank identification information, including the device IP address and device ID. The information needs to be broadcast to all ranks in the cluster for HCCL initialization. For details about the definition of the HcclRootInfo type, see HcclRootInfo. |
Returns
HcclResult: HCCL_SUCCESS on success; else, failure.
Constraints
None
Example
1 2 3 4 5 6 7 8 9 10 |
uint32_t rankSize = 8; uint32_t deviceId = 0; // Generate the identification information of the root rank. HcclRootInfo rootInfo; HcclGetRootInfo(&rootInfo); // Initialize the communicator. HcclComm hcclComm; HcclCommInitRootInfo(rankSize, &rootInfo, deviceId, &hcclComm); // Destroy the communicator. HcclCommDestroy(hcclComm); |
