npu.distribute.broadcast

Description

Synchronizes variables between workers in distributed NPU training.

Prototype

npu.distribute.broadcast(values, root_rank, fusion=2, fusion_id=0, group="hccl_world_group")

Parameters

Parameter

Input/Output

Description

values

Input

A TensorFlow variable or variable set.

For the Atlas Training Series Product, the supported data types are int8, int32, float16, float32, int64, and uint64.

root_rank

Input

An int.

Rank ID of the root node, which is the rank ID in the group.

fusion

Input

An int.

Broadcast operator fusion flag. The value can be one of the following:

  • 0: The broadcast operator is not fused with other broadcast operators during network compilation.
  • 2: Broadcast operators with the same fusion_id are fused during network compilation.

fusion_id

Input

An int.

Broadcast operator fusion ID.

If fusion is set to 2, broadcast operators with the same fusion_id are fused during network compilation.

group

Input

A string of up to 128 bytes, including the end character.

Group name, which can be a user-defined value or hccl_world_group.

Returns

None

Example

To broadcast the variables on device 0 to the rest devices:

1
2
3
4
5
6
# rank_id = 0  rank_size = 8
import npu_device as npu
x = tf.Variable(tf.random.normal(shape=()))
print("before broadcast", x)
npu.distribute.broadcast(x, root_rank=0)
print("after_broadcast", x)

Before the broadcast:

After the broadcast: