ProposalConcat
Function Usage
Inserts consecutive elements into the corresponding positions in Region Proposals. In each iteration, 16 consecutive elements are inserted into the corresponding positions in 16 Region Proposals.
Region Proposal description:
Only two data types are supported currently, and they are half and float.
[x1, y1, x2, y2, score, label, reserved_0, reserved_1]
When a Region Proposal is of the half data type, it occupies 16 bytes. Byte[15:12] is invalid and Byte[11:0] contains six half elements. To be specific, Byte[11:10] is defined as the label, Byte[9:8] is defined as the score, byte[7:6] is defined as y2, Byte[5:4] is defined as x2, Byte[3:2] is defined as y1, and Byte[1:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

When a Region Proposal is of the float data type, it occupies 32 bytes. Byte[31:24] is invalid and Byte[23:0] contains six float elements. To be specific, Byte[23:20] is defined as the label, Byte[19:16] is defined as the score, Byte[15:12] is defined as y2, Byte[11:8] is defined as x2, Byte[7:4] is defined as y1, and Byte[3:0] is defined as x1.
The table in the following figure contains 16 Region Proposals.

Prototype
1 2 | template <typename T> __aicore__ inline void ProposalConcat(const LocalTensor<T>& dstLocal, const LocalTensor<T>& srcLocal, const int32_t repeatTimes, const int32_t modeNumber) |
Parameters
Parameter |
Input/Output |
Meaning |
|---|---|---|
dstLocal |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. For the |
srcLocal |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. For the |
repeatTimes |
Input |
Number of iteration repeats. The value is of the int32_t type. In each iteration, 16 elements are inserted into 16 Region Proposals. The next group of 16 elements will be inserted into the next group of 16 Region Proposals in the next iteration. Value range: repeatTimes ∈ [0,255] |
modeNumber |
Input |
Insertion position. Value range: mode_number ∈ [0, 5]; data type: int32_t. Only the following configurations are supported:
|
Returns
None
Availability
Constraints
- Ensure that the number of Region Proposals stored in dstLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
- Ensure that the number of elements stored in srcLocal is greater than or equal to the required number. Otherwise, tensor access violation occurs.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Example
- API usage example
1 2
// repeatTimes = 2, modeNumber = 4. Insert 32 elements into the score fields in 32 Region Proposals. AscendC::ProposalConcat(dstLocal, srcLocal, 2, 4);
- Complete example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
#include "kernel_operator.h" class KernelVecProposal { public: __aicore__ inline KernelVecProposal() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>(); AscendC::DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void Compute() { AscendC::LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>(); AscendC::LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); AscendC::ProposalConcat(dstLocal, srcLocal, repeat, mode); // This section only demonstrates the usage of the Concat command. Note that the non-score data in dstLocal may be random values. outQueueDst.EnQue<half>(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); AscendC::DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrc; AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueDst; AscendC::GlobalTensor<half> srcGlobal, dstGlobal; int srcDataSize = 32; int dstDataSize = 256; int repeat = srcDataSize / 16; int mode = 4; }; extern "C" __global__ __aicore__ void vec_proposal_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { KernelVecProposal op; op.Init(src, dstGm); op.Process(); }
Result example: Input (src_gm): [ 33.3 67.56 68.5 -11.914 25.19 -72.8 11.79 -49.47 49.44 84.4 -14.36 45.97 52.47 -5.387 -13.12 -88.9 54. -51.62 -20.67 59.56 35.72 -6.12 -39.4 -11.46 -7.066 30.23 -11.18 -35.84 -40.88 60.9 -73.3 38.47 ] Output (dst_gm): [ 0. 0. 0. 0. 33.3 0. 0. 0. 0. 0. 0. 0. 67.56 0. 0. 0. 0. 0. 0. 0. 68.5 0. 0. 0. 0. 0. 0. 0. -11.914 0. 0. 0. 0. 0. 0. 0. 25.19 0. 0. 0. 0. 0. 0. 0. -72.8 0. 0. 0. 0. 0. 0. 0. 11.79 0. 0. 0. 0. 0. 0. 0. -49.47 0. 0. 0. 0. 0. 0. 0. 49.44 0. 0. 0. 0. 0. 0. 0. 84.4 0. 0. 0. 0. 0. 0. 0. -14.36 0. 0. 0. 0. 0. 0. 0. 45.97 0. 0. 0. 0. 0. 0. 0. 52.47 0. 0. 0. 0. 0. 0. 0. -5.387 0. 0. 0. 0. 0. 0. 0. -13.12 0. 0. 0. 0. 0. 0. 0. -88.9 0. 0. 0. 0. 0. 0. 0. 54. 0. 0. 0. 0. 0. 0. 0. -51.62 0. 0. 0. 0. 0. 0. 0. -20.67 0. 0. 0. 0. 0. 0. 0. 59.56 0. 0. 0. 0. 0. 0. 0. 35.72 0. 0. 0. 0. 0. 0. 0. -6.12 0. 0. 0. 0. 0. 0. 0. -39.4 0. 0. 0. 0. 0. 0. 0. -11.46 0. 0. 0. 0. 0. 0. 0. -7.066 0. 0. 0. 0. 0. 0. 0. 30.23 0. 0. 0. 0. 0. 0. 0. -11.18 0. 0. 0. 0. 0. 0. 0. -35.84 0. 0. 0. 0. 0. 0. 0. -40.88 0. 0. 0. 0. 0. 0. 0. 60.9 0. 0. 0. 0. 0. 0. 0. -73.3 0. 0. 0. 0. 0. 0. 0. 38.47 0. 0. 0. ]