HcclCommInitAll

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products / Atlas A2 inference products , only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products , only the Atlas 300I Duo inference card is supported.

Description

In the single-server communication scenario, a process is used to create a communicator for multiple devices (one device corresponds to one thread). During the initialization of the communicator, devices[0] functions as the root rank to automatically collect cluster information.

Prototype

1
HcclResult HcclCommInitAll(uint32_t ndev, int32_t*  devices, HcclComm* comms)

Parameters

Parameter

Input/Output

Description

ndev

Input

Number of devices in the communicator.

devices

Input

List of devices in the communicator. The values on the list are logical IDs of the devices, which can be queried by running the npu-smi info -m command. HCCL creates communicators in the sequence specified by the devices parameter.

Note that the entered device list cannot contain duplicate device IDs.

comms

Output

Array of generated communicator handles. Its size is ndev * sizeof(HcclComm).

For details about the definition of the HcclComm type, see HcclComm.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

  • This interface applies only to the single-server communication scenario.
  • When multiple threads call collective operation APIs (such as HcclAllReduce), ensure that the time difference between collective operation API calls in different threads does not exceed the value of the environment variable HCCL_CONNECT_TIMEOUT to avoid link setup timeout.
  • One device cannot call multiple collective operation APIs at a time.

Example

1
2
3
4
5
6
7
8
9
uint32_t rankSize = 2;
int32_t devices[rankSize] = {0, 1};
HcclComm comms[rankSize];
// Initialize the communicator.
HcclCommInitAll(rankSize, devices, comms);
// Destroy the communicator.
for (uint32_t i = 0; i < rankSize; i++) {
    HcclCommDestroy(comms[i]);
}