GetBatchC

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	x
Atlas inference product's Vector Core	x
Atlas training products	x

Function

This API has the same function as GetBatchTensorC. GetBatchTensorC is recommended.

When GetBatchC is called once, a matrix C slice is obtained. This API can be used together with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN after IterateNBatch is called for iterative computation.

Prototype

template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

template <bool sync = true>
__aicore__ inline void GetBatchC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

**Table 1** Template parameters
Parameter	Description
sync	Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode.

**Table 2** API parameters
Parameter	Input/Output	Description
batchA	Input	Number of batches of the left matrix.
batchB	Input	Number of batches of the right matrix.
enSequentialWrite	Input	Whether the output data is stored continuously. The default value is false (discontinuous write mode).
c	Input	Matrix C, which is used to store matrix slices. The type is LocalTensor.

Returns

GlobalTensor<DstT>: computed matrix slices

Restrictions

This API is not supported when enableMixDualMaster (dual-master mode) is set to true.

Example

// Calculate the number of loops required for multi-batch computation.
int g_lay = tiling.ALayoutInfoG > tiling.BLayoutInfoG ? tiling.ALayoutInfoG : tiling.BLayoutInfoG;
int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Execute multi-batch Matmul computation.
mm1.template IterateNBatch<false>(for_exent, batchA, batchB, false);
// ...other compute
for (int i = 0; i < for_exent ; ++i) {
    mm1.template GetBatchC<false>(ubCmatrix, batchA, batchB); 
    // ...other compute
}

Parent topic: Matmul Kernel APIs