npu.distribute.broadcast

Description

Synchronizes variables between workers in distributed NPU training.

Prototype

npu.distribute.broadcast(values, root_rank, fusion=2, fusion_id=0, group="hccl_world_group")

Parameters

Parameter	Input/Output	Description
values	Input	A TensorFlow variable or variable set. For the Atlas Training Series Product, the supported data types are int8, int32, float16, float32, int64, and uint64.
root_rank	Input	An int. Rank ID of the root node, which is the rank ID in the group.
fusion	Input	An int. Broadcast operator fusion flag. The value can be one of the following: 0: The broadcast operator is not fused with other broadcast operators during network compilation. 2: Broadcast operators with the same fusion_id are fused during network compilation.
fusion_id	Input	An int. Broadcast operator fusion ID. If fusion is set to 2, broadcast operators with the same fusion_id are fused during network compilation.
group	Input	A string of up to 128 bytes, including the end character. Group name, which can be a user-defined value or hccl_world_group.

Returns

None

Example

To broadcast the variables on device 0 to the rest devices:

# rank_id = 0  rank_size = 8
import npu_device as npu
x = tf.Variable(tf.random.normal(shape=()))
print("before broadcast", x)
npu.distribute.broadcast(x, root_rank=0)
print("after_broadcast", x)

Before the broadcast:

After the broadcast: