GetBatchTensorC

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	x
Atlas inference product's Vector Core	x
Atlas training products	x

Function

Obtains a matrix C slice after it is called once and works with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.

Prototype

template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchTensorC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

template <bool sync = true>
__aicore__ inline void GetBatchTensorC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

**Table 1** Template parameters
Parameter	Description
sync	Only the asynchronous mode is supported. That is, this parameter can only be set to false.

**Table 2** API parameters
Parameter	Input/Output	Description
batchA	Input	Number of batches of the left matrix.
batchB	Input	Number of batches of the right matrix.
enSequentialWrite	Input	This parameter is reserved and can be ignored.
c	Input	Address of matrix C in the local memory, which is used to store matrix slices.

Returns

GlobalTensor<DstT>: computed matrix slices

Restrictions

This API is not supported when enableMixDualMaster (dual-master mode) is set to true.
When matrix C slices are output to the local memory and the size of the N direction for single-core computation (singleCoreN) is not 32-byte aligned, CubeFormat of matrix C only supports the ND_ALIGN format. When matrix C slices are output, the data along the singleCoreN direction is automatically padded to 32 bytes.

Example

// Calculate the number of loops required for multi-batch computation.
int for_extent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Execute multi-batch Matmul computation.
mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false);
...other compute
for (int i = 0; i < for_extent; ++i) {   
    mm1.template GetBatchTensorC<false>(ubCmatrix); 
    ...other compute
}

Parent topic: Matmul Kernel APIs