GetBatchC
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
x |
|
x |
|
x |
|
x |
Function
This API has the same function as GetBatchTensorC. GetBatchTensorC is recommended.
When GetBatchC is called once, a matrix C slice is obtained. This API can be used together with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN after IterateNBatch is called for iterative computation.
Prototype
1 2 | template <bool sync = true> __aicore__ inline GlobalTensor<DstT> GetBatchC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
1 2 | template <bool sync = true> __aicore__ inline void GetBatchC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false) |
Parameters
Parameter |
Description |
|---|---|
sync |
Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode. |
Parameter |
Input/Output |
Description |
|---|---|---|
batchA |
Input |
Number of batches of the left matrix. |
batchB |
Input |
Number of batches of the right matrix. |
enSequentialWrite |
Input |
Whether the output data is stored continuously. The default value is false (discontinuous write mode). |
c |
Input |
Matrix C, which is used to store matrix slices. The type is LocalTensor. |
Returns
GlobalTensor<DstT>: computed matrix slices
Restrictions
This API is not supported when enableMixDualMaster (dual-master mode) is set to true.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | // Calculate the number of loops required for multi-batch computation. int g_lay = tiling.ALayoutInfoG > tiling.BLayoutInfoG ? tiling.ALayoutInfoG : tiling.BLayoutInfoG; int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum; mm1.SetTensorA(gm_a[0], isTransposeAIn); mm1.SetTensorB(gm_b[0], isTransposeBIn); if (tiling.isBias) { mm1.SetBias(gm_bias[0]); } // Execute multi-batch Matmul computation. mm1.template IterateNBatch<false>(for_exent, batchA, batchB, false); // ...other compute for (int i = 0; i < for_exent ; ++i) { mm1.template GetBatchC<false>(ubCmatrix, batchA, batchB); // ...other compute } |