allreduce

Description

Performs the reduction operation on the input data of all ranks in a group and sends the result to the output buffer of all ranks. The reduction operation type is specified by the reduction parameter. This API operates the collective communication operator AllReduce.

Prototype

def allreduce(tensor, reduction, fusion=1, fusion_id=-1, group = "hccl_world_group")

Parameters

Parameter	Input/Output	Description
tensor	Input	TensorFlow tensor type. Atlas Training Series Product: The supported data types are int8, int32, int64, float16, and float32.
reduction	Input	A string. Reduction operation types, which can be max, min, prod, and sum. NOTE:
fusion	Input	An int. allreduce operator fusion flag. The value can be one of the following: 0: The AllReduce operator is not fused with other AllReduce operators during network compilation. 1: The AllReduce operator is fused based on the gradient splitting policy during network compilation. 2: AllReduce operators with the same fusion_id are fused during network compilation.
fusion_id	Input	An int. Fusion ID. If fusion is set to 2, AllReduce operators with the same fusion_id are fused during network compilation.
group	Input	A string containing a maximum of 128 bytes, including the end character. Group name, which can be a user-defined value or hccl_world_group.

Returns

The result tensor

Constraints

The caller rank must be within the range defined by the group argument passed to this API call. Otherwise, the API call fails.
Each rank has only one input.
The upstream node of allreduce must not be variable.
The input tensor size must be less than or equal to 8 GB.
The AllReduce operator can be fused only when the reduction is set to sum.

Applicability

Atlas Training Series Product

Example

The following is only a code snippet and cannot be executed. For details about how to call the HCCL Python APIs to perform collective communication, see Sample Code.

from npu_bridge.npu_init import *
tensor = tf.random_uniform((1, 3), minval=1, maxval=10, dtype=tf.float32)
result = hccl_ops.allreduce(tensor, "sum")

Parent topic: npu_bridge.hccl.hccl_ops