HcclCreateSubCommConfig
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
☓ |
|
|
☓ |
|
|
√ |
For
Description
Splits an existing global communicator into sub-communicators with specific configurations.
In this way, a sub-communicator can be created without socket link setup and rank information exchange, which can be used to create a communicator fast in the case of service faults.
If the load is unbalanced between devices on the network, the link setup of the sub-communicator created using this API may time out due to asynchronous communication between devices. In this case, you can use the environment variable HCCL_CONNECT_TIMEOUT to increase the timeout for link setup between devices. For example:
export HCCL_CONNECT_TIMEOUT=600
Prototype
1
|
HcclResult HcclCreateSubCommConfig(HcclComm *comm, uint32_t rankNum, uint32_t *rankIds, uint64_t subCommId, uint32_t subCommRankId, HcclCommConfig *config, HcclComm *subComm) |
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
comm |
Input |
Global communicator to be split. For details about the definition of the HcclComm type, see HcclComm. |
|
rankNum |
Input |
Number of ranks in the sub-communicator to be split. |
|
rankIds |
Input |
Array consisting of the IDs of the ranks in the sub-communicator in the global communicator. Note: The array should be ordered. The subscript of each rank in the array is mapped to its rank ID in the sub-communicator. |
|
subCommId |
Input |
ID of the current sub-communicator, which is user-defined.
|
|
subCommRankId |
Input |
Rank ID of the current rank in the sub-communicator. Set this parameter to the subscript index of the current rank in the rankIds array. |
|
config |
Input |
Communicator configuration options, including the buffer size, deterministic computing switch, communicator name, and location for expanding the orchestration of the communication algorithm. Configuration parameters must fall within the valid value range. For details on the parameters and their priorities in HcclCommConfig, see HcclCommConfig. Note that the input config must be initialized by calling HcclCommConfigInit first. |
|
subComm |
Output |
Pointer to the initialized sub-communicator. For details about the definition of the HcclComm type, see HcclComm. |
Returns
HcclResult: HCCL_SUCCESS on success; else, failure.
Constraints
- When ranks in the same sub-communicator call this API, the rankNum, rankIds, subCommId, and config parameters passed must be the same.
- For ranks that do not need to create a sub-communicator, pass rankIds=nullptr and subCommId=0xFFFFFFFF. In this scenario, the subCommId parameter is not verified.
- Sub-communicators can only be generated by splitting the global communicator. Sub-communicators cannot be further split.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// Initialize the global communicator. HcclComm globalHcclComm; HcclCommInitClusterInfo(rankTableFile, devId, &globalHcclComm); // Configure the communicator. HcclCommConfig config; HcclCommConfigInit(&config); config.hcclBufferSize = 50; strcpy(config.hcclCommName, "comm_1"); // Initialize the sub-communicators. HcclComm hcclComm; uint32_t rankIds[4] = {0, 1, 2, 3}; // Rank list of the sub-communicators. // Set the ID of the current rank in the sub-communicator to 0. HcclCreateSubCommConfig(&globalHcclComm, 4, rankIds, 1, 0, &config, &hcclComm); |