GetBatchC

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

Atlas training products

x

Function

This API has the same function as GetBatchTensorC. GetBatchTensorC is recommended.

When GetBatchC is called once, a matrix C slice is obtained. This API can be used together with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN after IterateNBatch is called for iterative computation.

Prototype

1
2
template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)
1
2
template <bool sync = true>
__aicore__ inline void GetBatchC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

Table 1 Template parameters

Parameter

Description

sync

Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode.

Table 2 API parameters

Parameter

Input/Output

Description

batchA

Input

Number of batches of the left matrix.

batchB

Input

Number of batches of the right matrix.

enSequentialWrite

Input

Whether the output data is stored continuously. The default value is false (discontinuous write mode).

c

Input

Matrix C, which is used to store matrix slices. The type is LocalTensor.

Returns

GlobalTensor<DstT>: computed matrix slices

Restrictions

This API is not supported when enableMixDualMaster (dual-master mode) is set to true.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Calculate the number of loops required for multi-batch computation.
int g_lay = tiling.ALayoutInfoG > tiling.BLayoutInfoG ? tiling.ALayoutInfoG : tiling.BLayoutInfoG;
int for_exent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Execute multi-batch Matmul computation.
mm1.template IterateNBatch<false>(for_exent, batchA, batchB, false);
// ...other compute
for (int i = 0; i < for_exent ; ++i) {
    mm1.template GetBatchC<false>(ubCmatrix, batchA, batchB); 
    // ...other compute
}