GetBatchTensorC
Function Usage
Obtains a matrix C slice after GetBatchTensorC is being called once and works with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.
Prototype
1 2 | template <bool sync = true> __aicore__ inline GlobalTensor<DstT> GetBatchTensorC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
1 2 | template <bool sync = true> __aicore__ inline void GetBatchTensorC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
Parameters
Parameter |
Description |
|---|---|
sync |
Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode. |
Parameter |
Input/Output |
Description |
|---|---|---|
batchA |
Input |
Number of batches of the left matrix. |
batchB |
Input |
Number of batches of the right matrix. |
enSequentialWrite |
Input |
Whether the output data is stored continuously. The default value is false (discontinuous write mode). |
c |
Input |
Address of matrix C in the local memory, which is used to store matrix slices. |
Returns
GlobalTensor<DstT>: computed matrix slices
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | // Calculate the number of loops required for multi-batch computation. int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum; mm1.SetTensorA(gm_a[0], isTransposeAIn); mm1.SetTensorB(gm_b[0], isTransposeBIn); if (tiling.isBias) { mm1.SetBias(gm_bias[0]); } // Multi-batch Matmul computation mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false); ...other compute for (int i = 0; i <for_exent ; ++i) { mm.GetBatchTensorC<false>(ubCmatrix); ...other compute } |