HcclCreateSubCommConfig
Description
Splits an existing global communicator into sub-communicators with specific configurations.
In this way, a sub-communicator can be created without socket link setup and rank information exchange, which can be used to create a communicator fast in the case of service faults.
If the load is unbalanced between devices on the network, the link setup of the sub-communicator created using this API may time out due to asynchronous communication between devices. In this case, you can use the environment variable HCCL_CONNECT_TIMEOUT to increase the timeout for link setup between devices. For example:
export HCCL_CONNECT_TIMEOUT=600
Prototype
HcclResult HcclCreateSubCommConfig(HcclComm *comm, uint32_t rankNum, uint32_t *rankIds, uint64_t subCommId, uint32_t subCommRankId, HcclCommConfig *config, HcclComm *comm)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
comm |
Input |
Global communicator to be split. For details about the definition of the HcclComm type, see HcclComm. |
|
rankNum |
Input |
Number of ranks in the sub-communicator to be split. |
|
rankIds |
Input |
Array consisting of the IDs of the ranks in the sub-communicator in the global communicator. Note: The array should be ordered. The subscript of each rank in the array is mapped to its rank ID in the sub-communicator. |
|
subCommId |
Input |
Current sub-communicator ID. The value is user-defined and must be unique in the global communicator. |
|
subCommRankId |
Input |
Rank ID of the current rank in the sub-communicator. Set this parameter to the subscript index of the current rank in the rankIds array. |
|
config |
Input |
Configuration options of the communicator, including the buffer size, deterministic computing switch, and communicator name. The configuration parameters must be within the valid value range. For details, see HcclCommConfig. Notes:
|
|
comm |
Output |
Pointer to the initialized sub-communicator. For details about the definition of the HcclComm type, see HcclComm. |
Returns
HcclResult: HCCL_SUCCESS on success; else, failure.
Constraints
- When ranks in the same sub-communicator call this API, the rankNum, rankIds, subCommId, and config parameters passed must be the same.
- For a rank that does not need to create a sub-communicator, rankIds==nullptr and subCommId=0xFFFFFFFF should be passed.
- For the same global communicator, you cannot use the same subCommId to create the sub-communicator.
- Only the global communicator can be split into sub-communicators. Nested splitting of communicators is not supported.
Applicability
Example
1 2 3 4 5 6 7 8 9 |
HcclComm globalHcclComm; HcclCommInitClusterInfo(rankTableFile, devId, &globalHcclComm); HcclCommConfig config; HcclCommConfigInit(&config); config.hcclBufferSize = 50; strcpy(config.hcclCommName, "comm_1"); HcclComm hcclComm; uint32_t rankIds[4] = {0, 1, 2, 3}; HCCLCHECK(HcclCreateSubCommConfig(&globalHcclComm, 4, rankIds, 1, devId, &config, &hcclComm)); |
For details about the complete code example, see Creating a Sub-Communicator by Using HcclCreateSubCommConfig.