ProposalExtract
Function Usage
Extracts elements from corresponding positions in Region Proposals and rearranges them. In each iteration, 16 elements are extracted from 16 Region Proposals and arranged consecutively. The functionality of this API is the opposite of that of ProposalConcat.
Prototype
1 2 | template <typename T> __aicore__ inline void ProposalExtract(const LocalTensor<T>& dstLocal, const LocalTensor<T>& srcLocal, const int32_t repeatTimes, const int32_t modeNumber) |
Parameters
Parameter |
Input/Output |
Meaning |
|---|---|---|
dstLocal |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. For the |
srcLocal |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. For the |
repeatTimes |
Input |
Number of iteration repeats. The value is of the int32_t type. In each iteration, 16 elements are extracted from 16 Region Proposals and rearranged. The next group of 16 elements will be extracted from the next group of 16 Region Proposals in the next iteration. Value range: repeatTimes ∈ [0,255] |
modeNumber |
Input |
Extraction position. Value range: mode_number ∈ [0, 5]; data type: int32_t. Only the following configurations are supported:
|
Returns
None
Availability
Constraints
- Ensure that the number of Region Proposals stored in srcLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
- Ensure that the number of elements stored in dstLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Example
- API usage example
1 2
// repeatTimes = 2, modeNumber = 4. Extract elements from the score fields in 32 Region Proposals and arrange them into 32 consecutive elements. AscendC::ProposalExtract(dstLocal, srcLocal, 2, 4);
- Complete example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
#include "kernel_operator.h" class KernelVecProposal { public: __aicore__ inline KernelVecProposal() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>(); AscendC::DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void Compute() { AscendC::LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>(); AscendC::LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); AscendC::ProposalExtract(dstLocal, srcLocal, repeat, mode); outQueueDst.EnQue<half>(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); AscendC::DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrc; AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueDst; AscendC::GlobalTensor<half> srcGlobal, dstGlobal; int srcDataSize = 256; int dstDataSize = 32; int repeat = srcDataSize / 16; int mode = 4; }; extern "C" __global__ __aicore__ void vec_proposal_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { KernelVecProposal op; op.Init(src, dstGm); op.Process(); }
Result example: Input (src_gm): [ 0. 0. 0. 0. 33.3 0. 0. 0. 0. 0. 0. 0. 67.56 0. 0. 0. 0. 0. 0. 0. 68.5 0. 0. 0. 0. 0. 0. 0. -11.914 0. 0. 0. 0. 0. 0. 0. 25.19 0. 0. 0. 0. 0. 0. 0. -72.8 0. 0. 0. 0. 0. 0. 0. 11.79 0. 0. 0. 0. 0. 0. 0. -49.47 0. 0. 0. 0. 0. 0. 0. 49.44 0. 0. 0. 0. 0. 0. 0. 84.4 0. 0. 0. 0. 0. 0. 0. -14.36 0. 0. 0. 0. 0. 0. 0. 45.97 0. 0. 0. 0. 0. 0. 0. 52.47 0. 0. 0. 0. 0. 0. 0. -5.387 0. 0. 0. 0. 0. 0. 0. -13.12 0. 0. 0. 0. 0. 0. 0. -88.9 0. 0. 0. 0. 0. 0. 0. 54. 0. 0. 0. 0. 0. 0. 0. -51.62 0. 0. 0. 0. 0. 0. 0. -20.67 0. 0. 0. 0. 0. 0. 0. 59.56 0. 0. 0. 0. 0. 0. 0. 35.72 0. 0. 0. 0. 0. 0. 0. -6.12 0. 0. 0. 0. 0. 0. 0. -39.4 0. 0. 0. 0. 0. 0. 0. -11.46 0. 0. 0. 0. 0. 0. 0. -7.066 0. 0. 0. 0. 0. 0. 0. 30.23 0. 0. 0. 0. 0. 0. 0. -11.18 0. 0. 0. 0. 0. 0. 0. -35.84 0. 0. 0. 0. 0. 0. 0. -40.88 0. 0. 0. 0. 0. 0. 0. 60.9 0. 0. 0. 0. 0. 0. 0. -73.3 0. 0. 0. 0. 0. 0. 0. 38.47 0. 0. 0. ] Output (dst_gm): [ 33.3 67.56 68.5 -11.914 25.19 -72.8 11.79 -49.47 49.44 84.4 -14.36 45.97 52.47 -5.387 -13.12 -88.9 54. -51.62 -20.67 59.56 35.72 -6.12 -39.4 -11.46 -7.066 30.23 -11.18 -35.84 -40.88 60.9 -73.3 38.47 ]