Sort

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

x

Function

Sorts data in descending order by value. The sorted data is saved in the following arrangement mode:

For the Atlas A3 training products/Atlas A3 inference products, mode 1 is used.

For the Atlas A2 training products/Atlas A2 inference products, mode 1 is used.

For the Atlas inference product's AI Core, method 2 is used.

  • Layout mode 1:
    A maximum of 32 scores can be sorted in one iteration. The sorted scores and their corresponding indexes are stored in dstLocal in the (score, index) struct. Regardless of whether the score is of the half or float type, the (score, index) structure in the dst always occupies 8 bytes. See the following examples:
    • When the score type is float and the index type is uint32, in the computation result, the indexes are stored in the upper 4 bytes and the scores are stored in the lower 4 bytes.

    • When the score type is half and the index type is uint32, in the computation result, the indexes are stored in the upper 4 bytes, the scores are stored in the lower 2 bytes, and the middle 2 bytes are reserved.

  • Layout mode 2: region proposal
    The input and output data are Region Proposals. 16 Region Proposals are sorted in each iteration. Each Region Proposal consumes eight contiguous elements of the half or float data type. Its format is as follows:
    1
    [x1, y1, x2, y2, score, label, reserved_0, reserved_1]
    

    When a Region Proposal is of the half data type, it occupies 16 bytes. Byte[15:12] is invalid and Byte[11:0] contains six half elements. To be specific, Byte[11:10] is defined as the label, Byte[9:8] is defined as the score, Byte[7:6] is defined as y2, Byte[5:4] is defined as x2, Byte[3:2] is defined as y1, and Byte[1:0] is defined as x1.

    The table in the following figure contains 16 Region Proposals.

    When a Region Proposal is of the float data type, it occupies 32 bytes. Byte[31:24] is invalid and Byte[23:0] contains six float elements. To be specific, Byte[23:20] is defined as the label, Byte[19:16] is defined as the score, Byte[15:12] is defined as y2, Byte[11:8] is defined as x2, Byte[7:4] is defined as y1, and Byte[3:0] is defined as x1.

    The table in the following figure contains 16 Region Proposals.

Prototype

1
2
template <typename T, bool isFullSort>
__aicore__ inline void Sort(const LocalTensor<T>& dst, const LocalTensor<T>& concat, const LocalTensor<uint32_t>& index, LocalTensor<T>& tmp, const int32_t repeatTime)

Parameters

Table 1 Template parameters

Parameter

Definition

T

Data type of an operand.

For the Atlas A3 training products/Atlas A3 inference products, the supported data types are half and float.

For the Atlas A2 training products/Atlas A2 inference products, the supported data types are half and float.

For the Atlas inference product's AI Core, the supported data types are half and float.

isFullSort

Whether to enable the full sorting mode. In full sorting mode, all inputs are sorted in descending order. For details about non-full sorting mode, see the description of repeatTimes in Table 2.

Table 2 Parameters

Parameter

Input/Output

Description

dst

Output

Destination operand. The shape is [2n].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

concat

Input

Source operand, that is, score in the API function description. The shape is [n].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

The source operand must have the same data type as the destination operand.

index

Input

Source operand. The shape is [n].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

This source operand is fixed at the uint32_t data type.

tmp

Input

Temporary space. It is used to store intermediate variables during complex computation inside the API. The size of the temporary space is obtained by the developer. For details about how to obtain the size of the temporary space, see GetSortTmpSize. The data type must be the same as that of the source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

repeatTime

Input

Number of iteration repeats. The value is of the int32_t type.

  • Atlas A3 training products/Atlas A3 inference products: 32 elements are sorted in each iteration. In the next iteration, 32 elements are skipped for concat and index respectively, and 32 x 8 bytes are skipped for dst. Value range: repeatTime ∈ [0, 255]
  • Atlas A2 training products/Atlas A2 inference products: 32 elements are sorted in each iteration. In the next iteration, 32 elements are skipped for concat and index respectively, and 32 x 8 bytes are skipped for dst. Value range: repeatTime ∈ [0, 255]
  • Atlas inference product's AI Core: 16 region proposals are sorted in each iteration. In the next iteration, 16 region proposals are skipped for concat and dst respectively. Value range: repeatTime ∈ [0, 255]

Returns

None

Constraints

  • When score[i] is the same as score[j], if i > j, score[j] is selected first. That is, the index sequence is the same as the input sequence.
  • In non-full sorting mode, data within each iteration is sorted, but data across different iterations is not sorted.
  • For details about the operand address alignment requirements, see General Address Alignment Restrictions.

Example

To obtain an operator sample project, click sort.

  • Processing 128 pieces of half-type data

    This example applies to:

    Atlas A2 training products/Atlas A2 inference products

    Atlas A3 training products/Atlas A3 inference products

    1
    2
    3
    4
    5
    6
    uint32_t elementCount = 128;
    uint32_t m_sortRepeatTimes = m_elementCount / 32;
    uint32_t m_extractRepeatTimes = m_elementCount / 32;
    AscendC::Concat(concatLocal, valueLocal, concatTmpLocal, m_concatRepeatTimes);
    AscendC::Sort<T, isFullSort>(sortedLocal, concatLocal, indexLocal, sortTmpLocal, m_sortRepeatTimes);
    AscendC::Extract(dstValueLocal, dstIndexLocal, sortedLocal, m_extractRepeatTimes);
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    Result example:
    Input data (srcValueGm): 128 pieces of half-type data
    [31 30 29 ... 2 1 0
     63 62 61 ... 34 33 32
     95 94 93 ... 66 65 64
     127 126 125 ... 98 97 96]
    Input (srcIndexGm):
    [31 30 29 ... 2 1 0
     63 62 61 ... 34 33 32
     95 94 93 ... 66 65 64
     127 126 125 ... 98 97 96]
    Output (dstValueGm):
    [127 126 125 ... 2 1 0]
    Output (dstIndexGm):
    [127 126 125 ... 2 1 0]
    
  • Processing 64 pieces of half-type data

    This example applies to:

    Atlas inference product's AI Core

    1
    2
    3
    4
    5
    6
    uint32_t elementCount = 64;
    uint32_t m_sortRepeatTimes = m_elementCount / 16;
    uint32_t m_extractRepeatTimes = m_elementCount / 16;
    AscendC::Concat(concatLocal, valueLocal, concatTmpLocal, m_concatRepeatTimes);
    AscendC::Sort<T, isFullSort>(sortedLocal, concatLocal, indexLocal, sortTmpLocal, m_sortRepeatTimes);
    AscendC::Extract(dstValueLocal, dstIndexLocal, sortedLocal, m_extractRepeatTimes);
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    Result example:
    Input data (srcValueGm): 64 pieces of half-type data
    [15 14 13 ... 2 1 0
     31 30 29 ... 18 17 16
     47 46 45 ... 34 33 32
     63 62 61 ... 50 49 48]
    Input (srcIndexGm):
    [15 14 13 ... 2 1 0
     31 30 29 ... 18 17 16
     47 46 45 ... 34 33 32
     63 62 61 ... 50 49 48]
    Output (dstValueGm):
    [63 62 61 ... 2 1 0]
    Output (dstIndexGm):
    [63 62 61 ... 2 1 0]