HcclAllReduce

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

For Atlas A2 training products / Atlas A2 inference products , only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.

For the Atlas inference products , only the Atlas 300I Duo inference card is supported.

Description

Adds the input data of all nodes in the communicator (or performs other reduction operations) and sends the result to the output buffer of all nodes. The reduction operation type is specified by the op parameter.

Prototype

1
HcclResult HcclAllReduce(void *sendBuf, void *recvBuf, uint64_t count, HcclDataType dataType, HcclReduceOp op, HcclComm comm, aclrtStream stream)

Parameters

Parameter

Input/Output

Description

sendBuf

Input

Address of the send buffer.

recvBuf

Output

Address of the buffer to receive collective communication result.

count

Input

Number of data records to perform AllReduce operation. For example, if only one int32 data record is involved, then count=1.

dataType

Input

Data type of the AllReduce operation, which is of the HcclDataType type.

Atlas A3 training products / Atlas A3 inference products : The supported data types are int8, int16, int32, int64, float16, float32, and bfp16.

Atlas A2 training products / Atlas A2 inference products : The supported data types are int8, int16, int32, int64, float16, float32, and bfp16. Note that the performance will deteriorate for the int64 data type.

Atlas training products : The supported data types are int8, int32, int64, float16, and float32.

Atlas 300I Duo inference card: The supported data types are int8, int16, int32, float16, and float32.

op

Input

Reduction operation type. Currently, the following operation types are supported: sum, prod, max, and min.

NOTE:

Atlas A3 training products / Atlas A3 inference products : The prod operation does not support the int16 and bfp16 data types.

Atlas A2 training products / Atlas A2 inference products : The prod operation does not support the int16 and bfp16 data types.

Atlas 300I Duo inference card: The prod, max, and min operations do not support the int16 data type.

comm

Input

Communicator where the operation is performed.

stream

Input

Stream of the rank.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

  • All ranks must have the same count, dataType, and op.
  • Each rank has only one input.
  • The input and output addresses (sendBuf and recvBuf) of the operator must meet the following alignment requirements based on different data types:
    • int8: 1-byte aligned
    • int16, float16, bfp16: 2-byte aligned
    • int32 and float32: 4-byte aligned
    • int64: 8-byte aligned

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Allocate device memory for collective communication.
void *sendBuf = nullptr;
void *recvBuf = nullptr;
uint64_t count = 8;
size_t mallocSize = count * sizeof(float);
aclrtMalloc((void **)&sendBuf, mallocSize, ACL_MEM_MALLOC_HUGE_ONLY);
aclrtMalloc((void **)&recvBuf, mallocSize, ACL_MEM_MALLOC_HUGE_ONLY);

// Initialize the communicator.
uint32_t rankSize = 8;
HcclComm hcclComm;
HcclCommInitRootInfo(rankSize, &rootInfo, deviceId, &hcclComm);

// Create a task flow.
aclrtStream stream;
aclrtCreateStream(&stream);

// Execute AllReduce to add input data of all ranks in the communicator and send the result to the output buffer of all ranks.
HcclAllReduce(sendBuf, recvBuf, count, HCCL_DATA_TYPE_FP32, HCCL_REDUCE_SUM, hcclComm, stream);
// Wait until the collective communication task in the task flow is complete.
aclrtSynchronizeStream(stream);

// Free resources.
aclrtFree(sendBuf);          // Free the device memory.
aclrtFree(recvBuf);          // Free the device memory.
aclrtDestroyStream(stream);  // Destroy the task flow.
HcclCommDestroy(hcclComm);   // Destroy the communicator.