npu.distribute.all_reduce
Description
Performs aggregation operation between workers in distributed NPU training.
Prototype
npu.distribute.all_reduce(values, reduction="mean", fusion=1, fusion_id=-1, group="hccl_world_group")
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
values |
Input |
TensorFlow tensor type. For the |
reduction |
Input |
A string. Aggregation operation type. The value can be mean, max, min, prod, or sum. |
fusion |
Input |
An int. AllReduce operator fusion flag. The value can be one of the following:
|
fusion_id |
Input |
An int. AllReduce operator fusion ID. If fusion is set to 2, AllReduce operators with the same fusion_id are fused during network compilation. |
group |
Input |
A string of up to 128 bytes, including the end character. Group name, which can be a user-defined value or hccl_world_group. |
Returns
Result tensor, whose values match with those in the values input with ordering preserved. Has the same type as values.
Example
To aggregate a value on multiple devices:
1 2 3 4 5 | # rank_id = 0 rank_size = 8 import npu_device as npu v = tf.constant(1.0) x = npu.distribute.all_reduce([v], 'sum') # 8.0 y = npu.distribute.all_reduce([v], 'mean') # 1.0 |