GetBatchC
Function Description
This API has the same function as GetBatchTensorC. GetBatchTensorC is recommended.
When GetBatchC is called once, a matrix C slice is obtained. This API can be used together with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of the std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.
Prototype
1 2 | template <bool sync = true> __aicore__ inline GlobalTensor<DstT> GetBatchC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
1 2 | template <bool sync = true> __aicore__ inline void GetBatchC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
Parameters
Parameter |
Description |
|---|---|
sync |
Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode. |
Parameter |
Input/Output |
Description |
|---|---|---|
batchA |
Input |
Number of batches of the left matrix. |
batchB |
Input |
Number of batches of the right matrix. |
enSequentialWrite |
Input |
Whether the output data is stored continuously. The default value is false (discontinuous write mode). |
c |
Input |
Address of matrix C in the local memory, which is used to store matrix slices. |
Returns
GlobalTensor<DstT>: computed matrix slices
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | // Calculate the number of loops required for multi-batch computation. int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum; mm1.SetTensorA(gm_a[0], isTransposeAIn); mm1.SetTensorB(gm_b[0], isTransposeBIn); if (tiling.isBias) { mm1.SetBias(gm_bias[0]); } // Multi-batch Matmul computation mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false); ...other compute for (int i = 0; i <for_exent ; ++i) { mm.GetBatchC<false>(ubCmatrix); ...other compute } |