ProposalExtract

Function Usage

Extracts elements from corresponding positions in Region Proposals and rearranges them. In each iteration, 16 elements are extracted from 16 Region Proposals and arranged consecutively. The functionality of this API is the opposite of that of ProposalConcat.

Prototype

1
2
template <typename T>
__aicore__ inline void ProposalExtract(const LocalTensor<T>& dstLocal, const LocalTensor<T>& srcLocal, const int32_t repeatTimes, const int32_t modeNumber)

Parameters

Table 1 Parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

For the Atlas Training Series Product, the supported data type is half.

srcLocal

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

The source operand must have the same data type as the destination operand.

For the Atlas Training Series Product, the supported data type is half.

repeatTimes

Input

Number of iteration repeats. The value is of the int32_t type. In each iteration, 16 elements are extracted from 16 Region Proposals and rearranged. The next group of 16 elements will be extracted from the next group of 16 Region Proposals in the next iteration. Value range: repeatTimes ∈ [0,255]

modeNumber

Input

Extraction position. Value range: mode_number ∈ [0, 5]; data type: int32_t. Only the following configurations are supported:
  • 0: extracts from x1.
  • 1: extracts from y1.
  • 2: extracts from x2.
  • 3: extracts from y2.
  • 4: extracts from score.
  • 5: extracts from label.

Returns

None

Availability

Atlas Training Series Product

Constraints

  • Ensure that the number of Region Proposals stored in srcLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
  • Ensure that the number of elements stored in dstLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example

  • API usage example
    1
    2
    // repeatTimes = 2, modeNumber = 4. Extract elements from the score fields in 32 Region Proposals and arrange them into 32 consecutive elements.
    AscendC::ProposalExtract(dstLocal, srcLocal, 2, 4);
    
  • Complete example
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    #include "kernel_operator.h"
    
    class KernelVecProposal {
    public:
        __aicore__ inline KernelVecProposal() {}
        __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm)
        {
            srcGlobal.SetGlobalBuffer((__gm__ half*)src);
            dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm);
    
            pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half));
            pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half));
        }
        __aicore__ inline void Process()
        {
            CopyIn();
            Compute();
            CopyOut();
        }
    
    private:
        __aicore__ inline void CopyIn()
        {
            AscendC::LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>();
            AscendC::DataCopy(srcLocal, srcGlobal, srcDataSize);
            inQueueSrc.EnQue(srcLocal);
        }
        __aicore__ inline void Compute()
        {
            AscendC::LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>();
            AscendC::LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>();
    
            AscendC::ProposalExtract(dstLocal, srcLocal, repeat, mode);
    
            outQueueDst.EnQue<half>(dstLocal);
            inQueueSrc.FreeTensor(srcLocal);
        }
        __aicore__ inline void CopyOut()
        {
            AscendC::LocalTensor<half> dstLocal = outQueueDst.DeQue<half>();
            AscendC::DataCopy(dstGlobal, dstLocal, dstDataSize);
            outQueueDst.FreeTensor(dstLocal);
        }
    
    private:
        AscendC::TPipe pipe;
        AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrc;
        AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueDst;
        AscendC::GlobalTensor<half> srcGlobal, dstGlobal;
        int srcDataSize = 256;
        int dstDataSize = 32;
        int repeat = srcDataSize / 16;
        int mode = 4;
    };
    
    extern "C" __global__ __aicore__ void vec_proposal_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm)
    {
        KernelVecProposal op;
        op.Init(src, dstGm);
        op.Process();
    }
    
    Result example:
    Input (src_gm):
    [  0.      0.      0.      0.     33.3     0.      0.      0.      0.
       0.      0.      0.     67.56    0.      0.      0.      0.      0.
       0.      0.     68.5     0.      0.      0.      0.      0.      0.
       0.    -11.914   0.      0.      0.      0.      0.      0.      0.
      25.19    0.      0.      0.      0.      0.      0.      0.    -72.8
       0.      0.      0.      0.      0.      0.      0.     11.79    0.
       0.      0.      0.      0.      0.      0.    -49.47    0.      0.
       0.      0.      0.      0.      0.     49.44    0.      0.      0.
       0.      0.      0.      0.     84.4     0.      0.      0.      0.
       0.      0.      0.    -14.36    0.      0.      0.      0.      0.
       0.      0.     45.97    0.      0.      0.      0.      0.      0.
       0.     52.47    0.      0.      0.      0.      0.      0.      0.
      -5.387   0.      0.      0.      0.      0.      0.      0.    -13.12
       0.      0.      0.      0.      0.      0.      0.    -88.9     0.
       0.      0.      0.      0.      0.      0.     54.      0.      0.
       0.      0.      0.      0.      0.    -51.62    0.      0.      0.
       0.      0.      0.      0.    -20.67    0.      0.      0.      0.
       0.      0.      0.     59.56    0.      0.      0.      0.      0.
       0.      0.     35.72    0.      0.      0.      0.      0.      0.
       0.     -6.12    0.      0.      0.      0.      0.      0.      0.
     -39.4     0.      0.      0.      0.      0.      0.      0.    -11.46
       0.      0.      0.      0.      0.      0.      0.     -7.066   0.
       0.      0.      0.      0.      0.      0.     30.23    0.      0.
       0.      0.      0.      0.      0.    -11.18    0.      0.      0.
       0.      0.      0.      0.    -35.84    0.      0.      0.      0.
       0.      0.      0.    -40.88    0.      0.      0.      0.      0.
       0.      0.     60.9     0.      0.      0.      0.      0.      0.
       0.    -73.3     0.      0.      0.      0.      0.      0.      0.
      38.47    0.      0.      0.   ]
    Output (dst_gm):
    [ 33.3    67.56   68.5   -11.914  25.19  -72.8    11.79  -49.47   49.44
      84.4   -14.36   45.97   52.47   -5.387 -13.12  -88.9    54.    -51.62
     -20.67   59.56   35.72   -6.12  -39.4   -11.46   -7.066  30.23  -11.18
     -35.84  -40.88   60.9   -73.3    38.47 ]