InitBufPool
Function Usage
Initializes the TBufPool buffer resource pool. This API applies to the scenario where you want to manually specify UB/L1 buffer resource reuse when buffer resources are limited. After initialization, a child resource pool is allocated from the overall buffer resources. The child resource pool TBufPool provides the following resource management modes:
- The reloading API of TPipe::InitBufPool is reused with other TBufPool child resource pools.
- TBufPool:: InitBufPool continues to divide child resource pools.
- TBufPool::InitBuffer allocates buffers.
For details about the TBufPool and resource allocation diagram, see TBufPool.
Prototype
1 2 3 4 | template <class T> __aicore__ inline bool InitBufPool(T& bufPool, uint32_t len) template <class T, class U> __aicore__ inline bool InitBufPool(T& bufPool, uint32_t len, U& shareBuf) |
Parameters
Table 1 Parameters in InitBufPool(T& bufPool, uint32_t len) prototype definition
Parameter |
Input/Output |
Meaning |
|---|---|---|
bufPool |
Input |
New resource pool. The type is TBufPool. |
len |
Input |
Length of the new resource pool, in bytes. If the length is not 32-byte aligned, it is automatically padded to 32-byte aligned. |
Table 2 Parameters in InitBufPool(T& bufPool, uint32_t len, U& shareBuf) prototype definition
Parameter |
Input/Output |
Meaning |
|---|---|---|
bufPool |
Input |
New resource pool. The type is TBufPool. |
len |
Input |
Length of the new resource pool, in bytes. If the length is not 32-byte aligned, it is automatically padded to 32-byte aligned. |
shareBuf |
Input |
Reused resource pool of the TBufPool type. The newly divided resource pool shares the start address and length with the reused resource pool. |
Availability
Precautions
- The hardware attributes of the new resource pool must be the same as those of the reused resource pool. The two resource pools share the start address and length.
- The entered length must be less than or equal to the length of the reused resource pool.
- For details about other restrictions, see TBufPool.
Returns
None
Example
Due to the limited size of the physical memory, you can specify memory reuse to solve the problem of insufficient resources in the scenario where there is no data dependency during computing or the scenario where data dependency is in serial mode and no memory corruption occurs. In this example, Tpipe::InitBufPool initializes the child resource pool tbufPool1 and specifies the start address and length of tbufPool1 reused by tbufPool2. The subsequent computation of tbufPool1 and tbufPool2 are performed in serial mode, preventing data corruption and implementing memory reuse and automatic synchronization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | #include "kernel_operator.h" class ResetApi { public: __aicore__ inline ResetApi() {} __aicore__ inline void Init(__gm__ uint8_t* src0Gm, __gm__ uint8_t* src1Gm, __gm__ uint8_t* dstGm) { src0Global.SetGlobalBuffer((__gm__ half*)src0Gm); src1Global.SetGlobalBuffer((__gm__ half*)src1Gm); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); pipe.InitBufPool(tbufPool1, 196608); pipe.InitBufPool(tbufPool2, 196608, tbufPool1); } __aicore__ inline void Process() { tbufPool1.InitBuffer(queSrc0, 1, 65536); tbufPool1.InitBuffer(queSrc1, 1, 65536); tbufPool1.InitBuffer(queDst0, 1, 65536); CopyIn(); Compute(); CopyOut(); tbufPool1.Reset(); tbufPool2.InitBuffer(queSrc2, 1, 65536); tbufPool2.InitBuffer(queSrc3, 1, 65536); tbufPool2.InitBuffer(queDst1, 1, 65536); CopyIn1(); Compute1(); CopyOut1(); tbufPool2.Reset(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensor<half> src0Local = queSrc0.AllocTensor<half>(); AscendC::LocalTensor<half> src1Local = queSrc1.AllocTensor<half>(); AscendC::DataCopy(src0Local, src0Global, 512); AscendC::DataCopy(src1Local, src1Global, 512); queSrc0.EnQue(src0Local); queSrc1.EnQue(src1Local); } __aicore__ inline void Compute() { AscendC::LocalTensor<half> src0Local = queSrc0.DeQue<half>(); AscendC::LocalTensor<half> src1Local = queSrc1.DeQue<half>(); AscendC::LocalTensor<half> dstLocal = queDst0.AllocTensor<half>(); AscendC::Add(dstLocal, src0Local, src1Local, 512); queDst0.EnQue<half>(dstLocal); queSrc0.FreeTensor(src0Local); queSrc1.FreeTensor(src1Local); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<half> dstLocal = queDst0.DeQue<half>(); AscendC::DataCopy(dstGlobal, dstLocal, 512); queDst0.FreeTensor(dstLocal); } __aicore__ inline void CopyIn1() { AscendC::LocalTensor<half> src0Local = queSrc2.AllocTensor<half>(); AscendC::LocalTensor<half> src1Local = queSrc3.AllocTensor<half>(); AscendC::DataCopy(src0Local, src0Global, 512); AscendC::DataCopy(src1Local, src1Global, 512); queSrc2.EnQue(src0Local); queSrc3.EnQue(src1Local); } __aicore__ inline void Compute1() { AscendC::LocalTensor<half> src0Local = queSrc2.DeQue<half>(); AscendC::LocalTensor<half> src1Local = queSrc3.DeQue<half>(); AscendC::LocalTensor<half> dstLocal = queDst1.AllocTensor<half>(); AscendC::Add(dstLocal, src0Local, src1Local, 512); queDst1.EnQue<half>(dstLocal); queSrc2.FreeTensor(src0Local); queSrc3.FreeTensor(src1Local); } __aicore__ inline void CopyOut1() { AscendC::LocalTensor<half> dstLocal = queDst1.DeQue<half>(); AscendC::DataCopy(dstGlobal, dstLocal, 512); queDst1.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TBufPool<AscendC::TPosition::VECCALC> tbufPool1, tbufPool2; AscendC::TQue<AscendC::QuePosition::VECIN, 1> queSrc0, queSrc1, queSrc2, queSrc3; AscendC::TQue<AscendC::QuePosition::VECOUT, 1> queDst0, queDst1; AscendC::GlobalTensor<half> src0Global, src1Global, dstGlobal; }; extern "C" __global__ __aicore__ void tbufpool_kernel(__gm__ uint8_t* src0Gm, __gm__ uint8_t* src1Gm, __gm__ uint8_t* dstGm) { ResetApi op; op.Init(src0Gm, src1Gm, dstGm); op.Process(); } |