MatmulCallBackFunc

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

Atlas training products

x

Function

MatmulCallBackFunc allows you to set custom data movement (load/store) for Matmul matrices A, B, and C, such as non-contiguous loading or setting different strides between data fragments for storing. The specific method is as follows: You can implement one or more custom data movement functions as needed. When defining a Matmul object, you pass the function pointers of the implemented data movement functions through MatmulCallBackFunc. The passed function pointers will replace the default data movement functions in the Matmul process.

MatmulCallBackFunc contains three custom callback function APIs for users to configure three function pointers. The three function pointers are callback function pointers for copying matrix C from CO1 to GM, matrix A from GM to A1, and matrix B from GM to B1. The positions of the three function pointers are fixed. The positions must be set to null if the function pointers of the custom data movement functions are not used. For details about the definition and parameters of each function callback function, see Table 1. Each callback function implements the movement policy of a single base block (base block of matrix A: baseM × baseK; base block of matrix B: baseK × baseN; base block of matrix C: baseM × baseN) during matrix movement. It cannot manage the entire memory space. The default movement function of Matmul implements the movement of a single base block on a single core. The size of the moved base block is fixed. During the complete Matmul computation process, the movement function is called multiple times to move the base blocks arranged consecutively one by one. The following figure shows the process of moving base blocks to matrix A as an example.

Figure 1 Moving base blocks to matrix A
Table 1 APIs and parameters of MatmulCallBackFunc

Function

API

Parameter Description

Customize parameters such as the number of data segments to be migrated to migrate Matmul computation results from CO1 to GM.

void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr)

gm: output GM address

co1Local: computation result on CO1

dataCopyOutParams: pointer to the DataCopyOutParams structure defined by Matmul for reference

struct DataCopyOutParams {
uint16_t cBurstNum; // Number of moved data segments
uint16_t burstLen; // Length of the continuously moved data segments
uint16_t srcStride;// Interval between adjacent consecutive data segments of the source tensor
uint32_t dstStride; // Interval between adjacent consecutive data segments of the destination tensor
uint16_t oriNSize; // Size of the source tensor in the N direction during NZ-to-ND conversion
bool enUnitFlag; // Whether to enable UnitFlag
uint64_t quantScalar; // Quantized scalar value in the quantization scenario
uint64_t cbufWorkspaceAddr; // Quantized tensor address in the quantization scenario
}

tilingPtr: address of the tiling parameter set by SetUserDefInfo

dataPtr: computation data address set by SetSelfDefineData

Customize the initial address, block position, and block size of the left matrix to move the left matrix from GM to L1.

void CopyA1(const LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr)

aMatrix: address of the target L1 Buffer

gm: GM initial address of the left matrix

row and col: indexes of the blocks in the M and K directions, that is, the sequence numbers of the blocks in the M and K directions. The sequence numbers start from 0.

useM and useK: sizes of the blocks in the M and K directions. The unit is the number of elements. You can calculate the address offset of the upper left corner of a block in the left matrix based on row, col, useM, and useK.

tilingPtr: address of the tiling parameter set by SetUserDefInfo

dataPtr: computation data address set by SetSelfDefineData

Customize the initial address, block position, and block size of the right matrix to move the right matrix from GM to L1.

void CopyB1(const LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr)

bMatrix: address of the target L1 Buffer

gm: GM initial address of the right matrix

row and col: indexes of the transfer block in the K and N directions, that is, the sequence numbers of the transfer block in the K and N directions. The sequence numbers start from 0.

useK and useN: size of the block to be moved in the K and N directions, in the number of elements. You can calculate the address offset of the upper left corner of a block in the right matrix based on row, col, useK, and useN.

tilingPtr: address of the tiling parameter set by SetUserDefInfo

dataPtr: computation data address set by SetSelfDefineData

Restrictions

None

Example

For details about the complete example, see Matmul callback sample.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// User-defined callback function
void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr);
void CopyA1(const LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr);
void CopyB1(const LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr);

AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<DataCopyOut, CopyA1, CopyB1>> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
uint64_t tilingPtr = reinterpret_cast<uint64_t>(tiling);
mm.SetUserDefInfo(tilingPtr);
GlobalTensor<SrcT> dataGM; // Store the GM of the computation data required by the callback function.
uint64_t dataGMPtr = reinterpret_cast<uint64_t>(dataGM.address_);
mm.SetSelfDefineData(dataGMPtr);
mm.SetTensorA(gmA);
mm.SetTensorB(gmB);
mm.IterateAll();