HcclCreateSubCommConfig

Description

Splits an existing global communicator into sub-communicators with specific configurations.

In this way, a sub-communicator can be created without socket link setup and rank information exchange, which can be used to create a communicator fast in the case of service faults.

If the load is unbalanced between devices on the network, the link setup of the sub-communicator created using this API may time out due to asynchronous communication between devices. In this case, you can use the environment variable HCCL_CONNECT_TIMEOUT to increase the timeout for link setup between devices. For example:

export HCCL_CONNECT_TIMEOUT=600

Prototype

HcclResult HcclCreateSubCommConfig(HcclComm *comm, uint32_t rankNum, uint32_t *rankIds, uint64_t subCommId, uint32_t subCommRankId, HcclCommConfig *config, HcclComm *comm)

Parameters

Parameter

Input/Output

Description

comm

Input

Global communicator to be split.

For details about the definition of the HcclComm type, see HcclComm.

rankNum

Input

Number of ranks in the sub-communicator to be split.

rankIds

Input

Array consisting of the IDs of the ranks in the sub-communicator in the global communicator.

Note: The array should be ordered. The subscript of each rank in the array is mapped to its rank ID in the sub-communicator.

subCommId

Input

Current sub-communicator ID.

The value is user-defined and must be unique in the global communicator.

subCommRankId

Input

Rank ID of the current rank in the sub-communicator.

Set this parameter to the subscript index of the current rank in the rankIds array.

config

Input

Configuration options of the communicator, including the buffer size, deterministic computing switch, and communicator name. The configuration parameters must be within the valid value range. For details, see HcclCommConfig.

Notes:

  • The input config must be initialized by calling HcclCommConfigInit.
  • If you use config to specify a communicator name, ensure that the name is unique.
  • In config, the configuration option hcclBufferSize takes precedence over the environment variable HCCL_BUFFSIZE, and the configuration option hcclDeterministic takes precedence over the environment variable HCCL_DETERMINISTIC. For details about environment variables, see Environment Variables.

comm

Output

Pointer to the initialized sub-communicator.

For details about the definition of the HcclComm type, see HcclComm.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

  • When ranks in the same sub-communicator call this API, the rankNum, rankIds, subCommId, and config parameters passed must be the same.
  • For a rank that does not need to create a sub-communicator, rankIds==nullptr and subCommId=0xFFFFFFFF should be passed.
  • For the same global communicator, you cannot use the same subCommId to create the sub-communicator.
  • Only the global communicator can be split into sub-communicators. Nested splitting of communicators is not supported.

Applicability

Atlas Training Series Product

Example

A simple code snippet is as follows:
1
2
3
4
5
6
7
8
9
HcclComm globalHcclComm;
HcclCommInitClusterInfo(rankTableFile, devId, &globalHcclComm);
HcclCommConfig config;
HcclCommConfigInit(&config);
config.hcclBufferSize = 50;
strcpy(config.hcclCommName, "comm_1");
HcclComm hcclComm;
uint32_t rankIds[4] = {0, 1, 2, 3};
HCCLCHECK(HcclCreateSubCommConfig(&globalHcclComm, 4, rankIds, 1, devId, &config, &hcclComm));

For details about the complete code example, see Creating a Sub-Communicator by Using HcclCreateSubCommConfig.