HcclCommInitClusterInfoConfig

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products, only the Atlas 300I Duo inference card is supported.

Description

Initializes HCCL based on the rank table and creates an HCCL communicator with specific configurations.

Prototype

1
HcclResult HcclCommInitClusterInfoConfig(const char *clusterInfo, uint32_t rank, HcclCommConfig *config, HcclComm *comm)

Parameters

Parameter

Input/Output

Description

clusterInfo

Input

Directory (including the file name) of the rank table file, which is of the string type and contains a maximum of 4096 bytes, including the end character.

rank

Input

ID of the current rank.

Note that the value of this parameter must be the same as the value of rank_id in the rank table file.

config

Input

Communicator configuration options, including the buffer size, deterministic computing switch, communicator name, and location for expanding the orchestration of the communication algorithm. Configuration parameters must fall within the valid value range. For details on the parameters and their priorities in HcclCommConfig, see HcclCommConfig.

Note that the input config must be initialized by calling HcclCommConfigInit first.

comm

Output

Pointer to the initialized communicator.

For details about the definition of the HcclComm type, see HcclComm.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

Repeated initialization is not supported in the same communicator.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Initialize device resources.
aclInit(NULL);
// Directory of the rank table configuration file
const char *rankTableFile = "/path/to/rank_table.json";
// Specify the device used for the collective communication operations.
uint32_t rankSize = 8;
uint32_t devId = 0;
aclrtSetDevice(devId);
// Create and initialize the configuration option of the communicator.
HcclCommConfig config;
HcclCommConfigInit(&config);
// Modify the communicator configuration as required.
config.hcclBufferSize = 50;  // Size of the buffer for storing the shared data, in MB. The value must be greater than or equal to 1. The default value is 200.
std::strcpy(config.hcclCommName, "comm_1");
// Initialize the communicator.
HcclComm hcclComm;
// In this example, devId is used as the rank ID of the current rank.
HcclCommInitClusterInfoConfig(rankTableFile, devId, &config, &hcclComm);
// Destroy the communicator.
HcclCommDestroy(hcclComm);
// Deinitialize device resources.
aclFinalize();