GetBatchTensorC

Function Usage

Obtains a matrix C slice after GetBatchTensorC is being called once and works with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.

Prototype

template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchTensorC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

template <bool sync = true>
__aicore__ inline void GetBatchTensorC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

**Table 1** Parameters in the template
Parameter	Description
sync	Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode.

**Table 2** API parameters
Parameter	Input/Output	Description
batchA	Input	Number of batches of the left matrix.
batchB	Input	Number of batches of the right matrix.
enSequentialWrite	Input	Whether the output data is stored continuously. The default value is false (discontinuous write mode).
c	Input	Address of matrix C in the local memory, which is used to store matrix slices.

Returns

GlobalTensor<DstT>: computed matrix slices

Availability

Precautions

None

Example

// Calculate the number of loops required for multi-batch computation.
int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Multi-batch Matmul computation
mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false);
...other compute
for (int i = 0; i <for_exent ; ++i) {   
    mm.GetBatchTensorC<false>(ubCmatrix); 
    ...other compute
}

Parent topic: Matmul