HcclCommInitClusterInfo

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products / Atlas A2 inference products , only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products , only the Atlas 300I Duo inference card is supported.

Description

Initializes HCCL based on the rank table and creates an HCCL communicator.

The rank table file is in JSON format and configures the NPU resources involved in collective communication. For details about the configuration of the rank table file, see Cluster Information Configuration.

Prototype

1
HcclResult HcclCommInitClusterInfo(const char *clusterInfo, uint32_t rank, HcclComm *comm)

Parameters

Parameter

Input/Output

Description

clusterInfo

Input

Directory (including the file name) of the rank table file, which is of the string type and contains a maximum of 4096 bytes, including the end character.

rank

Input

ID of the current rank.

Note that the value of this parameter must be the same as the value of rank_id in the rank table file.

comm

Output

Pointer to the initialized communicator.

For details about the definition of the HcclComm type, see HcclComm.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

Repeated initialization would cause an error.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Initialize device resources.
aclInit(NULL);
// Directory of the rank table configuration file
char *rankTableFile = "/path/rank_table.json";
// Specify the device used for the collective communication operations.
aclrtSetDevice(devId);
// Create a communicator.
HcclComm hcclComm;
// In this example, devId is used as the rank ID of the current rank.
HcclCommInitClusterInfo(rankTableFile, devId, &hcclComm);
// Destroy the communicator.
HcclCommDestroy(hcclComm);
// Deinitialize device resources.
aclFinalize();