Sort

Function Usage

Sorts data in descending order by value. The sorted data is saved in the following layout mode:

  • Layout mode 1:
    A maximum of 32 scores can be sorted in one iteration. The sorted scores and their corresponding indexes are stored in dstLocal in the (score, index) structure. No matter whether the scores are of the half or float type, the (score, index) structure in dstLocal always occupies 8 bytes of space. See the following examples:
    • When the score type is float and the index type is uint32, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 4 bytes of the computation result.

    • When the score type is half and the index type is uint32, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 2 bytes of the computation result. The middle 2 bytes are reserved.

  • Layout mode 2: Region Proposal
    The input and output data are Region Proposals. 16 Region Proposals are sorted in each iteration. Each Region Proposal consumes eight consecutive elements of the half or float data type. Its format is as follows:
    1
    [x1, y1, x2, y2, score, label, reserved_0, reserved_1]
    

    When a Region Proposal is of the half data type, it occupies 16 bytes. Byte[15:12] is invalid and Byte[11:0] contains six half elements. To be specific, Byte[11:10] is defined as the label, Byte[9:8] is defined as the score, Byte[7:6] is defined as y2, Byte[5:4] is defined as x2, Byte[3:2] is defined as y1, and Byte[1:0] is defined as x1.

    The table in the following figure contains 16 Region Proposals.

    When a Region Proposal is of the float data type, it occupies 32 bytes. Byte[31:24] is invalid and Byte[23:0] contains six float elements. To be specific, Byte[23:20] is defined as the label, Byte[19:16] is defined as the score, Byte[15:12] is defined as y2, Byte[11:8] is defined as x2, Byte[7:4] is defined as y1, and Byte[3:0] is defined as x1.

    The table in the following figure contains 16 Region Proposals.

Prototype

1
2
template <typename T, bool isFullSort>
__aicore__ inline void Sort(const LocalTensor<T> &dstLocal, const LocalTensor<T> &concatLocal, const LocalTensor<uint32_t> &indexLocal, LocalTensor<T> &tmpLocal, const int32_t repeatTimes)

Parameters

Table 1 Parameters in the template

API

Function

T

Data type of the operand.

isFullSort

Whether to enable the full sorting mode. In full sorting mode, all inputs are sorted in descending order. For details about non-full sorting mode, see the description of repeatTimes in Table 2.

Table 2 Parameters

Parameter

Input/Output

Description

dstLocal

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

concatLocal

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

The source operand must have the same data type as the destination operand.

indexLocal

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

This source operand is fixed at the uint32_t data type.

tmpLocal

Input

Temporary space. This parameter is used to store intermediate variables during complex internal API computation and is provided by developers. The data type must be the same as that of the source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

repeatTimes

Input

Number of iteration repeats. The value is of the int32_t type.

Returns

None

Availability

Constraints

  • When score[i] is the same as score[j], if i>j, score[j] is selected first. That is, the index sequence is the same as the input sequence.
  • In non-full sorting mode, data within each iteration is sorted, but data across different iterations is not sorted.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example