npu.distribute.all_reduce

Description

Performs aggregation operation between workers in distributed NPU training.

Prototype

npu.distribute.all_reduce(values, reduction="mean", fusion=1, fusion_id=-1, group="hccl_world_group")

Parameters

Parameter

Input/Output

Description

values

Input

TensorFlow tensor type.

For the Atlas Training Series Product, the supported data types are int8, int32, float16, and float32.

reduction

Input

A string.

Aggregation operation type. The value can be mean, max, min, prod, or sum.

fusion

Input

An int.

AllReduce operator fusion flag. The value can be one of the following:

  • 0: The AllReduce operator is not fused with other AllReduce operators during network compilation.
  • 1: The AllReduce operator is fused based on the gradient splitting policy during network compilation.
  • 2: AllReduce operators with the same fusion_id are fused during network compilation.

fusion_id

Input

An int.

AllReduce operator fusion ID.

If fusion is set to 2, AllReduce operators with the same fusion_id are fused during network compilation.

group

Input

A string of up to 128 bytes, including the end character.

Group name, which can be a user-defined value or hccl_world_group.

Returns

Result tensor, whose values match with those in the values input with ordering preserved. Has the same type as values.

Example

To aggregate a value on multiple devices:

1
2
3
4
5
# rank_id = 0  rank_size = 8
import npu_device as npu
v = tf.constant(1.0)
x = npu.distribute.all_reduce([v], 'sum') # 8.0
y = npu.distribute.all_reduce([v], 'mean') # 1.0