MrgSort

Function Usage

Merges at most four sorted lists into one. The results are sorted in descending order of the score fields. The layout modes are as follows:

  • Layout mode 1:
    Generally, the data to be processed by MrgSort is the data processed by Sort, or the output of Sort. The list structure is as follows:
    • When the data type is float, each structure occupies eight bytes.

    • When the data type is half, each structure occupies eight bytes, but two bytes in the middle are reserved.

  • Layout mode 2: region proposal

    The input and output data are region proposals. For details, see mode 2 in Sort.

Prototype

1
2
template <typename T, bool isExhaustedSuspension = false>
__aicore__ inline void MrgSort(const LocalTensor<T> &dstLocal, const MrgSortSrcList<T> &sortList, const uint16_t elementCountList[4], uint32_t sortedNum[4], uint16_t validBit, const int32_t repeatTimes)

Parameters

Table 1 Parameters in the template

API

Function

T

Data type of the operand.

isExhaustedSuspension

Whether to stop merging after a list is exhausted (that is, all operations in the list have been sorted to the destination operand list). The type is bool. The options are as follows:

  • false: The merging stops only when all lists are exhausted.
  • true: The merging stops after a list is exhausted.

The default value is false.

Table 2 API parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand, which stores sorted data.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

sortList

Input

Source operand of the MrgSortSrcList structure type, which can contain two to four sorted lists. For details, see Table 3. The lists to be merged are passed to MrgSortSrcList.

1
2
3
4
5
6
7
template <typename T>
struct MrgSortSrcList {
    LocalTensor<T> src1;
    LocalTensor<T> src2;
    LocalTensor<T> src3; // If the number of lists to be merged is less than 3, the tensor can be empty.
    LocalTensor<T> src4; // If the number of lists to be merged is less than 4, the tensor can be empty.
};

elementCountList

Input

Lengths of the four source lists (or numbers of the 8-byte structures). It is an array of the uint16_t type with a length of 4. Theoretically, the value range of each element is [0, 4095], but the value cannot exceed the storage space of the UB.

sortedNum

Output

Number of sorted elements in each list when merging is stopped in exhaustion mode (that is, when isExhaustedSuspension is true).

validBit

Input

Number of valid lists. The values are as follows:
  • 0b11: The first two lists are valid.
  • 0b111: The first three lists are valid.
  • 0b1111: All the four lists are valid.

repeatTimes

Input

Number of iteration repeats. The total length of the four lists is skipped for the source and destination operands in each iteration. Value range: repeatTimes ∈ [1,255]

The repeatTimes parameter takes effect only when the following conditions are met:
  • srcLocal contains four lists and validBit is 15.
  • The lengths of the four source lists are the same.
  • The four source lists are stored consecutively.
  • isExhaustedSuspension is false.
Table 3 MrgSortSrcList parameters

Parameter

Input/Output

Meaning

src1

Input

Source operand, which stores the first sorted list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The source operand must have the same data type as the destination operand.

src2

Input

Source operand, which stores the second sorted list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The source operand must have the same data type as the destination operand.

src3

Input

Source operand, which stores the third sorted list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The source operand must have the same data type as the destination operand.

src4

Input

Source operand, which stores the fourth sorted list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The source operand must have the same data type as the destination operand.

Returns

None

Availability

Constraints

  • When score[i] is the same as score[j], if i is greater than j, score[j] is selected first. That is, the index sequence is the same as the input sequence.
  • Data within each iteration is sorted, but data among different iterations is not sorted.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example