MrgSort4

Function Usage

Merges at most four sorted Region Proposal lists into one. The results are sorted in descending order of the score fields.

Prototype

1
2
template <typename T>
__aicore__ inline void MrgSort4(const LocalTensor<T>& dstLocal, const MrgSortSrcList<T>& srcLocal, const MrgSort4Info& params)

Parameters

Table 1 Parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand, which stores sorted Region Proposals.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 16-byte aligned (for data of the half type) or 32-byte aligned (for data of the float type).

For the Atlas Training Series Product, the supported data type is half.

srcLocal

Input

Source operand of the MrgSortSrcList structure type, which contains four sorted Region Proposal lists. For details, see Table 2.

For the Atlas Training Series Product, the supported data type is half.

params

Input

Parameter required for sorting, which is of the MrgSort4Info structure type. For details, see Table 3.

Table 2 MrgSortSrcList parameters

Parameter

Input/Output

Meaning

src1

Input

Source operand, which stores the first sorted Region Proposal list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 16-byte aligned (for data of the half type) or 32-byte aligned (for data of the float type).

The source operand must have the same data type as the destination operand.

For the Atlas Training Series Product, the supported data type is half.

src2

Input

Source operand, which stores the second sorted Region Proposal list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 16-byte aligned (for data of the half type) or 32-byte aligned (for data of the float type).

The source operand must have the same data type as the destination operand.

For the Atlas Training Series Product, the supported data type is half.

src3

Input

Source operand, which stores the third sorted Region Proposal list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 16-byte aligned (for data of the half type) or 32-byte aligned (for data of the float type).

The source operand must have the same data type as the destination operand.

For the Atlas Training Series Product, the supported data type is half.

src4

Input

Source operand, which stores the fourth sorted Region Proposal list.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 16-byte aligned (for data of the half type) or 32-byte aligned (for data of the float type).

The source operand must have the same data type as the destination operand.

For the Atlas Training Series Product, the supported data type is half.

Table 3 MrgSort4Info parameters

Parameter

Input/Output

Meaning

elementLengths

Input

Lengths of the four source Region Proposal lists (or numbers of Region Proposals), an array of the uint16_t type with a length of 4. Theoretically, the value range of each element is [0, 4095], but the value cannot exceed the storage space of the UB.

ifExhaustedSuspension

Input

A bool specifying whether to stop the instruction when a list is exhausted. The default value is false.

validBit

Input

Number of valid lists. The values are as follows:
  • 3: The first two lists are valid.
  • 7: The first three lists are valid.
  • 15: All the four lists are valid.

repeatTimes

Input

Number of iteration repeats. The total length of the four lists is skipped for the source and destination operands in each iteration. Value range: repeatTimes ∈ [1,255]

The repeatTimes parameter takes effect only when the following conditions are met:
  • The lengths of the four source Region Proposal lists are the same.
  • The four source Region Proposal lists are stored consecutively.
  • ifExhaustedSuspension = False
  • validBit = 15

Availability

Atlas Training Series Product

Constraints

  • If the score values of proposal [i] and proposal [j] are the same and i is greater than j, proposal [j] is selected first.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.
  • The destination operand address must not overlap the source operand address.

Example

  • API usage example
    1
    2
    3
    4
    5
    // vconcatWorkLocal indicates the created and sorted four Region Proposal lists. Each list has 16 Region Proposals.
    struct MrgSortSrcList<half> srcList(vconcatWorkLocal[0], vconcatWorkLocal[1], vconcatWorkLocal[2], vconcatWorkLocal[3]);
    uint16_t elementLengths[4] = {16, 16, 16, 16};
    struct MrgSort4Info srcInfo(elementLengths, false, 15, 1);
    AscendC::MrgSort4(dstLocal, srcList, srcInfo);
    
  • Complete example
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    #include "kernel_operator.h"
    
    class KernelVecProposal {
    public:
        __aicore__ inline KernelVecProposal() {}
        __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm)
        {
            srcGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ half*>(src), srcDataSize);
            dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm);
    
            pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half));
            pipe.InitBuffer(workQueue, 1, dstDataSize * sizeof(half));
            pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half));
        }
        __aicore__ inline void Process()
        {
            CopyIn();
            Compute();
            CopyOut();
        }
    
    private:
        __aicore__ inline void CopyIn()
        {
            AscendC::LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>();
            AscendC::DataCopy(srcLocal, srcGlobal, srcDataSize);
            inQueueSrc.EnQue(srcLocal);
        }
        __aicore__ inline void Compute()
        {
            AscendC::LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>();
            AscendC::LocalTensor<half> vconcatWorkLocal = workQueue.AllocTensor<half>();
            AscendC::LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>();
    
            // Construct four Region Proposal lists and then sort and merge them.
            AscendC::ProposalConcat(vconcatWorkLocal[0], srcLocal[0], repeat, mode);
            AscendC::RpSort16(vconcatWorkLocal[0], vconcatWorkLocal[0], repeat);
    
            AscendC::ProposalConcat(vconcatWorkLocal[workDataSize], srcLocal[singleDataSize], repeat, mode);
            AscendC::RpSort16(vconcatWorkLocal[workDataSize], vconcatWorkLocal[workDataSize], repeat);
    
            AscendC::ProposalConcat(vconcatWorkLocal[workDataSize * 2], srcLocal[singleDataSize * 2], repeat, mode);
            AscendC::RpSort16(vconcatWorkLocal[workDataSize * 2], vconcatWorkLocal[workDataSize * 2], repeat);
    
            AscendC::ProposalConcat(vconcatWorkLocal[workDataSize * 3], srcLocal[singleDataSize * 3], repeat, mode);
            AscendC::RpSort16(vconcatWorkLocal[workDataSize * 3], vconcatWorkLocal[workDataSize * 3], repeat);
    
            AscendC::MrgSortSrcList<half> srcList(vconcatWorkLocal[0], vconcatWorkLocal[workDataSize],
                vconcatWorkLocal[workDataSize * 2], vconcatWorkLocal[workDataSize * 3]);
            uint16_t elementLengths[4] = {singleDataSize, singleDataSize, singleDataSize, singleDataSize};
            AscendC::MrgSort4Info srcInfo(elementLengths, false, 15, 1);
            AscendC::MrgSort4(dstLocal, srcList, srcInfo);
    
            outQueueDst.EnQue<half>(dstLocal);
            inQueueSrc.FreeTensor(srcLocal);
            workQueue.FreeTensor(vconcatWorkLocal);
        }
        __aicore__ inline void CopyOut()
        {
            AscendC::LocalTensor<half> dstLocal = outQueueDst.DeQue<half>();
            AscendC::DataCopy(dstGlobal, dstLocal, dstDataSize);
            outQueueDst.FreeTensor(dstLocal);
        }
    
    private:
        AscendC::TPipe pipe;
        AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrc;
        AscendC::TQue<AscendC::QuePosition::VECIN, 1> workQueue;
        AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueDst;
        AscendC::GlobalTensor<half> srcGlobal, dstGlobal;
    
        int srcDataSize = 64;
        uint16_t singleDataSize = srcDataSize / 4;
        int dstDataSize = 512;
        int workDataSize = dstDataSize / 4;
        int repeat = srcDataSize / 4 / 16;
        int mode = 4;
    };
    
    extern "C" __global__ __aicore__ void vec_proposal_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm)
    {
        KernelVecProposal op;
        op.Init(src, dstGm);
        op.Process();
    }
    
    Result example:
    Input (src_gm):
    [-38.1    82.7   -40.75  -54.62   21.67  -58.53   25.94  -79.5   -61.44
      26.7   -27.45   48.78   86.75  -18.1   -58.8    62.38   46.38  -78.94
     -87.7   -13.81  -13.25   46.94  -47.8   -50.44   34.16   20.3    80.1
     -94.1    52.4   -42.75   83.4    80.44  -66.8   -82.7   -91.44  -95.6
      66.2   -30.97  -36.53   61.66   24.92  -45.1    38.97  -34.62  -69.8
      59.1    34.22   11.695 -33.47   52.1    -4.832  46.88   56.78   71.4
      13.29  -35.78   52.44  -46.03   83.8    83.56   71.3    -9.086 -65.06
      46.25 ]
    Output (dst_gm):
    [  0.      0.      0.      0.     86.75    0.      0.      0.      0.
       0.      0.      0.     83.8     0.      0.      0.      0.      0.
       0.      0.     83.56    0.      0.      0.      0.      0.      0.
       0.     83.4     0.      0.      0.      0.      0.      0.      0.
      82.7     0.      0.      0.      0.      0.      0.      0.     80.44
       0.      0.      0.      0.      0.      0.      0.     80.1     0.
       0.      0.      0.      0.      0.      0.     71.4     0.      0.
       0.      0.      0.      0.      0.     71.3     0.      0.      0.
       0.      0.      0.      0.     66.2     0.      0.      0.      0.
       0.      0.      0.     62.38    0.      0.      0.      0.      0.
       0.      0.     61.66    0.      0.      0.      0.      0.      0.
       0.     59.1     0.      0.      0.      0.      0.      0.      0.
      56.78    0.      0.      0.      0.      0.      0.      0.     52.44
       0.      0.      0.      0.      0.      0.      0.     52.4     0.
       0.      0.      0.      0.      0.      0.     52.1     0.      0.
       0.      0.      0.      0.      0.     48.78    0.      0.      0.
       0.      0.      0.      0.     46.94    0.      0.      0.      0.
       0.      0.      0.     46.88    0.      0.      0.      0.      0.
       0.      0.     46.38    0.      0.      0.      0.      0.      0.
       0.     46.25    0.      0.      0.      0.      0.      0.      0.
      38.97    0.      0.      0.      0.      0.      0.      0.     34.22
       0.      0.      0.      0.      0.      0.      0.     34.16    0.
       0.      0.      0.      0.      0.      0.     26.7     0.      0.
       0.      0.      0.      0.      0.     25.94    0.      0.      0.
       0.      0.      0.      0.     24.92    0.      0.      0.      0.
       0.      0.      0.     21.67    0.      0.      0.      0.      0.
       0.      0.     20.3     0.      0.      0.      0.      0.      0.
       0.     13.29    0.      0.      0.      0.      0.      0.      0.
      11.695   0.      0.      0.      0.      0.      0.      0.     -4.832
       0.      0.      0.      0.      0.      0.      0.     -9.086   0.
       0.      0.      0.      0.      0.      0.    -13.25    0.      0.
       0.      0.      0.      0.      0.    -13.81    0.      0.      0.
       0.      0.      0.      0.    -18.1     0.      0.      0.      0.
       0.      0.      0.    -27.45    0.      0.      0.      0.      0.
       0.      0.    -30.97    0.      0.      0.      0.      0.      0.
       0.    -33.47    0.      0.      0.      0.      0.      0.      0.
     -34.62    0.      0.      0.      0.      0.      0.      0.    -35.78
       0.      0.      0.      0.      0.      0.      0.    -36.53    0.
       0.      0.      0.      0.      0.      0.    -38.1     0.      0.
       0.      0.      0.      0.      0.    -40.75    0.      0.      0.
       0.      0.      0.      0.    -42.75    0.      0.      0.      0.
       0.      0.      0.    -45.1     0.      0.      0.      0.      0.
       0.      0.    -46.03    0.      0.      0.      0.      0.      0.
       0.    -47.8     0.      0.      0.      0.      0.      0.      0.
     -50.44    0.      0.      0.      0.      0.      0.      0.    -54.62
       0.      0.      0.      0.      0.      0.      0.    -58.53    0.
       0.      0.      0.      0.      0.      0.    -58.8     0.      0.
       0.      0.      0.      0.      0.    -61.44    0.      0.      0.
       0.      0.      0.      0.    -65.06    0.      0.      0.      0.
       0.      0.      0.    -66.8     0.      0.      0.      0.      0.
       0.      0.    -69.8     0.      0.      0.      0.      0.      0.
       0.    -78.94    0.      0.      0.      0.      0.      0.      0.
     -79.5     0.      0.      0.      0.      0.      0.      0.    -82.7
       0.      0.      0.      0.      0.      0.      0.    -87.7     0.
       0.      0.      0.      0.      0.      0.    -91.44    0.      0.
       0.      0.      0.      0.      0.    -94.1     0.      0.      0.
       0.      0.      0.      0.    -95.6     0.      0.      0.   ]