Sort
Function Usage
Sorts data in descending order by value. The sorted data is saved in the following layout mode:
- Layout mode 1:A maximum of 32 scores can be sorted in one iteration. The sorted scores and their corresponding indexes are stored in dstLocal in the (score, index) structure. No matter whether the scores are of the half or float type, the (score, index) structure in dstLocal always occupies 8 bytes of space. See the following examples:
- When the score type is float and the index type is uint32, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 4 bytes of the computation result.
- When the score type is half and the index type is uint32, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 2 bytes of the computation result. The middle 2 bytes are reserved.
- Layout mode 2: Region ProposalThe input and output data are Region Proposals. 16 Region Proposals are sorted in each iteration. Each Region Proposal consumes eight consecutive elements of the half or float data type. Its format is as follows:
1[x1, y1, x2, y2, score, label, reserved_0, reserved_1]
When a Region Proposal is of the half data type, it occupies 16 bytes. Byte[15:12] is invalid and Byte[11:0] contains six half elements. To be specific, Byte[11:10] is defined as the label, Byte[9:8] is defined as the score, Byte[7:6] is defined as y2, Byte[5:4] is defined as x2, Byte[3:2] is defined as y1, and Byte[1:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

When a Region Proposal is of the float data type, it occupies 32 bytes. Byte[31:24] is invalid and Byte[23:0] contains six float elements. To be specific, Byte[23:20] is defined as the label, Byte[19:16] is defined as the score, Byte[15:12] is defined as y2, Byte[11:8] is defined as x2, Byte[7:4] is defined as y1, and Byte[3:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

Prototype
1 2 | template <typename T, bool isFullSort> __aicore__ inline void Sort(const LocalTensor<T> &dstLocal, const LocalTensor<T> &concatLocal, const LocalTensor<uint32_t> &indexLocal, LocalTensor<T> &tmpLocal, const int32_t repeatTimes) |
Parameters
API |
Function |
|---|---|
T |
Data type of the operand. |
isFullSort |
Whether to enable the full sorting mode. In full sorting mode, all inputs are sorted in descending order. For details about non-full sorting mode, see the description of repeatTimes in Table 2. |
Parameter |
Input/Output |
Description |
|---|---|---|
dstLocal |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
concatLocal |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. |
indexLocal |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. This source operand is fixed at the uint32_t data type. |
tmpLocal |
Input |
Temporary space. This parameter is used to store intermediate variables during complex internal API computation and is provided by developers. The data type must be the same as that of the source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
repeatTimes |
Input |
Number of iteration repeats. The value is of the int32_t type. |
Returns
None
Availability
Constraints
- When score[i] is the same as score[j], if i>j, score[j] is selected first. That is, the index sequence is the same as the input sequence.
- In non-full sorting mode, data within each iteration is sorted, but data across different iterations is not sorted.
- For details about the alignment requirements of the operand address offset, see General Restrictions.

