Sort
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
x |
|
√ |
|
x |
|
x |
Function
Sorts data in descending order by value. The sorted data is saved in the following arrangement mode:
For the
For the
For the
- Layout mode 1:A maximum of 32 scores can be sorted in one iteration. The sorted scores and their corresponding indexes are stored in dstLocal in the (score, index) struct. Regardless of whether the score is of the half or float type, the (score, index) structure in the dst always occupies 8 bytes. See the following examples:
- When the score type is float and the index type is uint32, in the computation result, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 4 bytes.
- When the score type is half and the index type is uint32, in the computation result, the indexes are stored in the upper 4 bytes, the scores are stored in the lower 2 bytes, and the middle 2 bytes are reserved.
- Layout mode 2: region proposalThe input and output data are Region Proposals. 16 Region Proposals are sorted in each iteration. Each Region Proposal consumes eight contiguous elements of the half or float data type. Its format is as follows:
1[x1, y1, x2, y2, score, label, reserved_0, reserved_1]
When a Region Proposal is of the half data type, it occupies 16 bytes. Byte[15:12] is invalid and Byte[11:0] contains six half elements. To be specific, Byte[11:10] is defined as the label, Byte[9:8] is defined as the score, Byte[7:6] is defined as y2, Byte[5:4] is defined as x2, Byte[3:2] is defined as y1, and Byte[1:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

When a Region Proposal is of the float data type, it occupies 32 bytes. Byte[31:24] is invalid and Byte[23:0] contains six float elements. To be specific, Byte[23:20] is defined as the label, Byte[19:16] is defined as the score, Byte[15:12] is defined as y2, Byte[11:8] is defined as x2, Byte[7:4] is defined as y1, and Byte[3:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

Prototype
1 2 | template <typename T, bool isFullSort> __aicore__ inline void Sort(const LocalTensor<T>& dst, const LocalTensor<T>& concat, const LocalTensor<uint32_t>& index, LocalTensor<T>& tmp, const int32_t repeatTime) |
Parameters
Parameter |
Definition |
|---|---|
T |
Data type of an operand. For the For the For the |
isFullSort |
Whether to enable the full sorting mode. In full sorting mode, all inputs are sorted in descending order. For details about non-full sorting mode, see the description of repeatTimes in Table 2. |
Parameter |
Input/Output |
Description |
|---|---|---|
dst |
Output |
Destination operand. The shape is [2n]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
concat |
Input |
Source operand, that is, score in the API function description. The shape is [n]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. |
index |
Input |
Source operand. The shape is [n]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. This source operand is fixed at the uint32_t data type. |
tmp |
Input |
Temporary space. It is used to store intermediate variables during complex computation inside the API. The size of the temporary space is obtained by the developer. For details about how to obtain the size of the temporary space, see GetSortTmpSize. The data type must be the same as that of the source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
repeatTime |
Input |
Number of iteration repeats. The value is of the int32_t type.
|
Returns
None
Constraints
- When score[i] is the same as score[j], if i > j, score[j] is selected first. That is, the index sequence is the same as the input sequence.
- In non-full sorting mode, data within each iteration is sorted, but data across different iterations is not sorted.
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
Example
To obtain an operator sample project, click sort.
- Processing 128 pieces of half-type data
Atlas A2 training products /Atlas A2 inference products Atlas A3 training products /Atlas A3 inference products 1 2 3 4 5 6
uint32_t elementCount = 128; uint32_t m_sortRepeatTimes = m_elementCount / 32; uint32_t m_extractRepeatTimes = m_elementCount / 32; AscendC::Concat(concatLocal, valueLocal, concatTmpLocal, m_concatRepeatTimes); AscendC::Sort<T, isFullSort>(sortedLocal, concatLocal, indexLocal, sortTmpLocal, m_sortRepeatTimes); AscendC::Extract(dstValueLocal, dstIndexLocal, sortedLocal, m_extractRepeatTimes);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Result example: Input data (srcValueGm): 128 pieces of half-type data [31 30 29 ... 2 1 0 63 62 61 ... 34 33 32 95 94 93 ... 66 65 64 127 126 125 ... 98 97 96] Input (srcIndexGm): [31 30 29 ... 2 1 0 63 62 61 ... 34 33 32 95 94 93 ... 66 65 64 127 126 125 ... 98 97 96] Output (dstValueGm): [127 126 125 ... 2 1 0] Output (dstIndexGm): [127 126 125 ... 2 1 0]
- Processing 64 pieces of half-type data
Atlas inference product 's AI Core1 2 3 4 5 6
uint32_t elementCount = 64; uint32_t m_sortRepeatTimes = m_elementCount / 16; uint32_t m_extractRepeatTimes = m_elementCount / 16; AscendC::Concat(concatLocal, valueLocal, concatTmpLocal, m_concatRepeatTimes); AscendC::Sort<T, isFullSort>(sortedLocal, concatLocal, indexLocal, sortTmpLocal, m_sortRepeatTimes); AscendC::Extract(dstValueLocal, dstIndexLocal, sortedLocal, m_extractRepeatTimes);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Result example: Input data (srcValueGm): 64 pieces of half-type data [15 14 13 ... 2 1 0 31 30 29 ... 18 17 16 47 46 45 ... 34 33 32 63 62 61 ... 50 49 48] Input (srcIndexGm): [15 14 13 ... 2 1 0 31 30 29 ... 18 17 16 47 46 45 ... 34 33 32 63 62 61 ... 50 49 48] Output (dstValueGm): [63 62 61 ... 2 1 0] Output (dstIndexGm): [63 62 61 ... 2 1 0]

