reduce

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

Description

Performs the sum operation (or other reduction operations) on the data of all ranks and sends the result to the specified position on the root rank.

Function Prototype

1
def reduce(tensor, reduction, root_rank, fusion=0, fusion_id=-1, group="hccl_world_group")

Parameters

Option

Input/Output

Description

tensor

Input

TensorFlow tensor type.

For the Atlas A3 training products/Atlas A3 inference products, the supported data types are int8, int16, int32, int64, float16, float32, and bfp16.

For the Atlas A2 training products/Atlas A2 inference products, the supported data types are int8, int16, int32, int64, float16, float32, and bfp16. Note that the performance will deteriorate for the int64 data type.

For the Atlas training products, the supported data types are int8, int32, int64, float16, and float32.

reduction

Input

String type.

Reduction operation types, which can be max, min, prod, and sum.

NOTE:

For the Atlas A3 training products/Atlas A3 inference products, the prod operation does not support the int16 and bfp16 data types in the current version.

For the Atlas A2 training products/Atlas A2 inference products, the prod operation does not support the int16 and bfp16 data types in the current version.

root_rank

Input

Int type.

Root rank ID in the group.

fusion

Input

Int type.

Reduce operator fusion flag. The values are as follows:

  • 0: disabled. The Reduce operator is not fused with other Reduce operators.
  • 2: enabled. Operators with the same fusion_id are fused.

fusion_id

Input

Int type.

Reduce operator fusion ID.

If fusion is set to 2, Reduce operators with the same fusion_id are fused during network compilation.

group

Input

A string containing a maximum of 128 bytes, including the end character.

Group name, which can be a user-defined value or hccl_world_group.

Returns

The result tensor

Constraints

  • The caller rank must be within the range defined by the group argument passed to this API call. Otherwise, the API call fails.
  • The input tensor size must be less than or equal to 8 GB.
  • For the Reduce operator fusion, only the reduction type sum is supported.

Example

1
2
3
from npu_bridge.hccl import hccl_ops
tensor = tf.random_uniform((1, 3), minval=1, maxval=10, dtype=tf.float32)
result = hccl_ops.reduce(tensor, "sum", 0)