HcclCommInitRootInfoConfig

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products, only the Atlas 300I Duo inference card is supported.

Description

Initializes the HCCL based on rootInfo and creates an HCCL communicator with specific configurations.

This API can be called concurrently by multiple threads within the same process. However, it only supports single-device single-thread scenarios. Concurrent calls on a single device across multiple threads are not supported.

As shown in the following figure, step 0 and step 1 cannot be called concurrently. Step 1 must be executed serially after step 0.

Prototype

1
HcclResult HcclCommInitRootInfoConfig(uint32_t nRanks, const HcclRootInfo *rootInfo, uint32_t rank, const HcclCommConfig *config, HcclComm *comm)

Parameters

Parameter

Input/Output

Description

nRanks

Input

Number of ranks in a cluster.

rootInfo

Input

Root rank information, including the IP address and ID of the root rank, which is generated by HcclGetRootInfo.

rank

Input

ID of the current rank.

config

Input

Communicator configuration options, including the buffer size, deterministic computing switch, communicator name, and location for expanding the orchestration of the communication algorithm. Configuration parameters must fall within the valid value range. For details on the parameters and their priorities in HcclCommConfig, see HcclCommConfig.

Note that the input config must be initialized by calling HcclCommConfigInit first.

comm

Output

Pointer to the initialized communicator.

For details about the definition of the HcclComm type, see HcclComm.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

The values of nRanks, rootInfo, and config of all ranks in the same communicator must be the same.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
uint32_t rankSize = 8;
uint32_t deviceId = 0;
// Generate the identification information of the root rank.
HcclRootInfo rootInfo;
HcclGetRootInfo(&rootInfo);

// Create and initialize the configuration option of the communicator.
HcclCommConfig config;
HcclCommConfigInit(&config);
// Modify the communicator configuration as required.
config.hcclBufferSize = 1024;  // Size of the buffer for storing the shared data, in MB. The value must be greater than or equal to 1. The default value is 200.
config.hcclDeterministic = 1;  // Indicates whether to enable deterministic computing for reduction communication operators. The default value is 0, indicating that deterministic computing is disabled.
std::strcpy(config.hcclCommName, "comm_1");
// Initialize the collective communicator.
HcclComm hcclComm;
HCCLCHECK(HcclCommInitRootInfoConfig(rankSize, &rootInfo, deviceId, &config, &hcclComm));

// Destroy the communicator.
HcclCommDestroy(hcclComm);