GetBatchC

Function Description

This API has the same function as GetBatchTensorC. GetBatchTensorC is recommended.

When GetBatchC is called once, a matrix C slice is obtained. This API can be used together with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of the std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.

Prototype

1
2
template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)
1
2
template <bool sync = true>
__aicore__ inline void GetBatchC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

Table 1 Parameters in the template

Parameter

Description

sync

Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode.

Table 2 API parameters

Parameter

Input/Output

Description

batchA

Input

Number of batches of the left matrix.

batchB

Input

Number of batches of the right matrix.

enSequentialWrite

Input

Whether the output data is stored continuously. The default value is false (discontinuous write mode).

c

Input

Address of matrix C in the local memory, which is used to store matrix slices.

Returns

GlobalTensor<DstT>: computed matrix slices

Availability

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Calculate the number of loops required for multi-batch computation.
int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Multi-batch Matmul computation
mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false);
...other compute
for (int i = 0; i <for_exent ; ++i) {   
    mm.GetBatchC<false>(ubCmatrix); 
    ...other compute
}