broadcast

Description

Broadcasts the data of the root rank in the communicator to other ranks.

Prototype

def broadcast(tensor, root_rank, fusion=2,fusion_id=0,group = "hccl_world_group")

Parameters

Parameter

Input/Output

Description

tensor

Input

A list of TensorFlow tensor type.

Atlas Training Series Product: The supported data types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, float16, float32, and float64.

root_rank

Input

An int.

Rank ID of the root rank. Must be a rank ID in the group.

group

Input

A string containing a maximum of 128 bytes, including the end character.

Group name, which can be a user-defined value or hccl_world_group.

fusion

Input

An int.

BroadCast operator fusion flag. The value can be one of the following:

  • 0: The BroadCast operator is not fused with other BroadCast operators during network compilation.
  • 2: BroadCast operators with the same fusion_id are fused during network compilation.

fusion_id

Input

An int.

Fusion ID of the BroadCast operator.

If fusion is set to 2, BroadCast operators with the same fusion_id are fused during network compilation.

Returns

The result tensor

Constraints

  • The caller rank must be within the range defined by the group argument passed to this API call. Otherwise, the API call fails.
  • If the input and output of two BroadCast operators depend on each other, they cannot be fused. Otherwise, a graph loop may occur.

    As shown in the following figure, the input and output dependency exists between broadcast2 and broadcast1. Therefore, the broadcast1 and broadcast2 operators cannot be fused. That is, when the broadcast API is called, the fusion parameter must be set to 0.

Applicability

Atlas Training Series Product

Example

The following is only a code snippet and cannot be executed. For details about how to call the HCCL Python APIs to perform collective communication, see Sample Code.

1
2
3
4
5
from npu_bridge.npu_init import *
tensor = tf.random_uniform((1, 3), minval=1, maxval=10, dtype=tf.float32)
inputs = [tensor]
root = 0
result = hccl_ops.broadcast(inputs, root)