GetBatchTensorC

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

Atlas training products

x

Function

Obtains a matrix C slice after it is being called once and works with the IterateNBatch asynchronous API. This API is used to obtain a matrix slice of std::max(batchA, batchB) × singleCoreM × singleCoreN size after IterateNBatch is called for iterative computation.

Prototype

1
2
template <bool sync = true>
__aicore__ inline GlobalTensor<DstT> GetBatchTensorC(uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)
1
2
template <bool sync = true>
__aicore__ inline void GetBatchTensorC(const LocalTensor<DstT>& c, uint32_t batchA, uint32_t batchB, bool enSequentialWrite = false)

Parameters

Table 1 Parameters in the template

Parameter

Description

sync

Only the asynchronous mode is supported. That is, this parameter can only be set to false.

Table 2 API parameters

Parameter

Input/Output

Description

batchA

Input

Number of batches of the left matrix.

batchB

Input

Number of batches of the right matrix.

enSequentialWrite

Input

This parameter is reserved and can be ignored.

c

Input

Address of matrix C in the local memory, which is used to store matrix slices.

Returns

GlobalTensor<DstT>: computed matrix slices

Restrictions

  • This API is not supported when enableMixDualMaster (dual-master mode) is set to true.
  • When matrix C slices are output to the local memory and the size of the N direction for single-core computation (singleCoreN) is not 32-byte aligned, CubeFormat of matrix C only supports the ND_ALIGN format. When matrix C slices are output, the data along the singleCoreN direction is automatically padded to 32 bytes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Calculate the number of loops required for multi-batch computation.
int for_extent = tiling.ALayoutInfoB * tiling.ALayoutInfoN * g_lay / tiling.BatchNum;
mm1.SetTensorA(gm_a[0], isTransposeAIn);
mm1.SetTensorB(gm_b[0], isTransposeBIn);
if (tiling.isBias) {
    mm1.SetBias(gm_bias[0]);
}
// Multi-batch Matmul computation
mm1.template IterateNBatch<false>(for_extent, batchA, batchB, false);
...other compute
for (int i = 0; i < for_extent; ++i) {   
    mm1.template GetBatchTensorC<false>(ubCmatrix); 
    ...other compute
}