Reduction Instructions

Reduction instructions simplify the dataset to a single value or a smaller set. Based on the data ranges of the reduction operations, the reduction instructions are classified into the following types, as shown in Figure 1.

  • ReduceMax/ReduceMin/ReduceSum: performs reduction operations on all input data to obtain the maximum value and its index, minimum value and its index, as well as the data sum, respectively.
  • WholeReduceMax/WholeReduceMin/WholeReduceSum: performs reduction operations on the input data in each repeat to obtain the maximum value and its index, minimum value and its index, as well as the data sum in each repeat, respectively. The internal index of the repeat is returned.
  • BlockReduceMax/BlockReduceMin/BlockReduceSum: performs reduction operations on the input data in each data block to obtain the maximum value, minimum value, and data sum in each data block, respectively.
  • PairReduce: sums two adjacent (odd and even) elements, for example, (a1, a2, a3, a4, a5, a6, ...). The reduction result is (a1 + a2, a3 + a4, a5 + a6, ...).
Figure 1 Reduction instructions

Like other basic APIs, the high-dimensional tensor sharding computing API is also provided for the reduction instructions. This API can fully utilize hardware advantages and enable developers to control the iteration execution of instructions and address stride of operands. However, the units and restrictions of its specific parameters are slightly different from those of basic APIs, which is described in the following:

  • repeatTimes: number of iterations. You can set repeatTimes to specify the number of iterations to execute the instruction multiple times.
    • ReduceMax/ReduceMin/ReduceSum are actually the encapsulation of WholeReduceMax/WholeReduceMin/WholeReduceSum respectively. If the value of repeatTimes exceeds 255, the APIs needs extra processing. Therefore, repeatTimes supports a larger value range. Ensure that the value does not exceed the maximum value of int32_t.
    • Similar to other basic APIs, the value of repeatTimes for WholeReduceMax/WholeReduceMin/WholeReduceSum/BlockReduceMax/BlockReduceMin/BlockReduceSum/PairReduce cannot exceed 255.
  • mask: controls the elements involved in computation in each iteration. The usage of this parameter is the same as that of basic APIs.
  • repeatStride: address stride between adjacent iterations.
    • The destination operands of the ReduceMax/ReduceMin/ReduceSum instructions are reduced to the maximum value/minimum value/sum. Therefore, the destination operands do not support repeatStride. Only the source operand supports repeatStride. The meaning and unit (data block) of repeatStride are the same as those in the descriptions of basic APIs.
    • Both the source and destination operands of WholeReduceMax/WholeReduceMin/WholeReduceSum/BlockReduceMax/BlockReduceMin/BlockReduceSum/PairReduce support repeatStride. The meaning and unit (data block) of repeatStride for the source operand are the same as those in the descriptions of basic APIs. The meaning and unit of repeatStride for the destination operand are different from those in the general description of basic APIs. After reduction, the length of the destination operand becomes shorter. For example, after WholeReduceSum reduction, each repeat is combined into a value. Therefore, the interval between iterations cannot be in the unit of a data block but the length of a repeat after reduction.
  • dataBlockStride: address stride of the data block in a single iteration.
    • The destination operands of the ReduceMax/ReduceMin/ReduceSum instructions are reduced to the maximum value/minimum value/sum. Therefore, the destination operands do not support dataBlockStride. The source operand does not support dataBlockStride.
    • The source operand of WholeReduceMax/WholeReduceMin/WholeReduceSum/BlockReduceMax/BlockReduceMin/BlockReduceSum/PairReduce supports dataBlockStride. The meaning and unit (data block) of the source operand dataBlockStride are the same as those in the descriptions of basic APIs. The destination operand does not support dataBlockStride because the length of the destination operand becomes shorter after reduction. For example, after WholeReduceSum reduction, each repeat is combined into a value, and the concepts of the data block and address stride in iteration are no longer used.