TSCM input matrix multiplication

Overview

TSCM indicates the logical memory corresponding to the L1 Buffer space. For details about the L1 Buffer, see Storage Unit. You can manage the TSCM to efficiently use hardware resources. For example, you can cache a copy of TSCM data and flexibly configure it as the A matrix, B matrix, or bias matrix of the Matmul operation in different scenarios, implementing memory reuse and computing efficiency optimization. In the TSCM input scenario, the user manages the entire TSCM memory space. Matmul directly uses the input TSCM memory address and does not transfer data from the global memory to the TSCM.

Use Case

The user needs to customize the data transfer to the TSCM and the management of the TSCM. That is, the user needs to customize the data transfer function, such as discontinuous data transfer or preprocessing of the transferred data. By customizing the TSCM, you can flexibly configure the MTE2 pipeline to implement global DoubleBuffer across Matmul objects. For details about MTE2, see Transfer Unit.

Restrictions

The matrix that is set as the TSCM input must be fully loaded in the TSCM. That is, all matrix data is transferred to and stored in the TSCM at the same time.

Examples

For details about complete operator samples, see Matmul operator sample with custom GM data source for TSCM input and BatchMatmul operator sample with custom TSCM input.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
The logical position of the TQue<TPosition::A1, 1> scm; // queue is A1, and the queue depth is 1.
pipe->InitBuffer(scm, 1, tiling.M * tiling.Ka * sizeof(A_T)); 
// The TPosition of A_TYPE is TSCM, and the TPosition of B_TYPE is GM.
Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE> mm1;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm1);
mm1.Init(&tiling);
// Customize the transfer from the GM to the TSCM for matrix A.
auto scmTensor = scm.AllocTensor<A_T>();
DataCopy(scmTensor, gm_a, tiling.M * tiling.Ka);
scm.EnQue(scmTensor);
LocalTensor<A_T> scmLocal = scm.DeQue<A_T>();
// Set matrix A as the TSCM input and matrix B as the GM input.
mm1.SetTensorA(scmLocal);
mm1.SetTensorB(gm_b);
mm1.SetBias(gm_bias);
mm1.IterateAll(gm_c);
scm.FreeTensor(scmLocal);