allgather

Description

Re-sorts the inputs of all ranks in the communicator by rank ID, combines the inputs, and sends the results to the outputs of all ranks.

For the AllGather operation, each rank receives a set of data that is resorted based on a rank ID, that is, AllGather outputs of all ranks are the same.

Prototype

def allgather(tensor, rank_size, group = "hccl_world_group", fusion=0, fusion_id=-1)

Parameters

Parameter

Input/Output

Description

tensor

Input

TensorFlow tensor type.

Atlas Training Series Product: The supported data types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, float16, float32, and float64.

rank_size

Input

An int.

Number of devices in a group.

The maximum value is 32768.

group

Input

A string containing a maximum of 128 bytes, including the end character.

Group name, which can be a user-defined value or hccl_world_group.

fusion

Input

An int.

AllGather operator fusion flag. The values are as follows:

  • 0: The AllGather operator is not fused with other AllGather operators during network compilation.
  • 2: allgather operators with the same fusion_id are fused during network compilation.

fusion_id

Input

An int.

AllGather operator fusion ID.

If fusion is set to 2, AllGather operators with the same fusion_id are fused during network compilation.

Returns

The result tensor

Constraints

The caller rank must be within the range defined by the group argument passed to this API call. Otherwise, the API call fails.

Applicability

Atlas Training Series Product

Example

The following is only a code snippet and cannot be executed. For details about how to call the HCCL Python APIs to perform collective communication, see Sample Code.

1
2
3
4
from npu_bridge.npu_init import *
tensor = tf.random_uniform((1, 3), minval=1, maxval=10, dtype=tf.float32)
rank_size = 2
result = hccl_ops.allgather(tensor, rank_size)