MatmulCallBackFunc
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
x |
|
x |
|
x |
|
x |
Function
MatmulCallBackFunc allows you to set custom data movement (load/store) for Matmul matrices A, B, and C, such as non-contiguous loading or setting different strides between data fragments for storing. The specific method is as follows: You can implement one or more custom data movement functions as needed. When defining a Matmul object, you pass the function pointers of the implemented data movement functions through MatmulCallBackFunc. The passed function pointers will replace the default data movement functions in the Matmul process.
MatmulCallBackFunc contains three custom callback function APIs for users to configure three function pointers. The three function pointers are callback function pointers for copying matrix C from CO1 to GM, matrix A from GM to A1, and matrix B from GM to B1. The positions of the three function pointers are fixed. The positions must be set to null if the function pointers of the custom data movement functions are not used. For details about the definition and parameters of each function callback function, see Table 1. Each callback function implements the movement policy of a single base block (base block of matrix A: baseM × baseK; base block of matrix B: baseK × baseN; base block of matrix C: baseM × baseN) during matrix movement. It cannot manage the entire memory space. The default movement function of Matmul implements the movement of a single base block on a single core. The size of the moved base block is fixed. During the complete Matmul computation process, the movement function is called multiple times to move the base blocks arranged consecutively one by one. The following figure shows the process of moving base blocks to matrix A as an example.

Function |
API |
Parameter Description |
|---|---|---|
Customize parameters such as the number of data segments to be migrated to migrate Matmul computation results from CO1 to GM. |
void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr) |
gm: output GM address co1Local: computation result on CO1 dataCopyOutParams: pointer to the DataCopyOutParams structure defined by Matmul for reference struct DataCopyOutParams {
uint16_t cBurstNum; // Number of moved data segments
uint16_t burstLen; // Length of the continuously moved data segments
uint16_t srcStride;// Interval between adjacent consecutive data segments of the source tensor
uint32_t dstStride; // Interval between adjacent consecutive data segments of the destination tensor
uint16_t oriNSize; // Size of the source tensor in the N direction during NZ-to-ND conversion
bool enUnitFlag; // Whether to enable UnitFlag
uint64_t quantScalar; // Quantized scalar value in the quantization scenario
uint64_t cbufWorkspaceAddr; // Quantized tensor address in the quantization scenario
}
tilingPtr: address of the tiling parameter set by SetUserDefInfo dataPtr: computation data address set by SetSelfDefineData |
Customize the initial address, block position, and block size of the left matrix to move the left matrix from GM to L1. |
void CopyA1(const LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr) |
aMatrix: address of the target L1 Buffer gm: GM initial address of the left matrix row and col: indexes of the blocks in the M and K directions, that is, the sequence numbers of the blocks in the M and K directions. The sequence numbers start from 0. useM and useK: sizes of the blocks in the M and K directions. The unit is the number of elements. You can calculate the address offset of the upper left corner of a block in the left matrix based on row, col, useM, and useK. tilingPtr: address of the tiling parameter set by SetUserDefInfo dataPtr: computation data address set by SetSelfDefineData |
Customize the initial address, block position, and block size of the right matrix to move the right matrix from GM to L1. |
void CopyB1(const LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr) |
bMatrix: address of the target L1 Buffer gm: GM initial address of the right matrix row and col: indexes of the transfer block in the K and N directions, that is, the sequence numbers of the transfer block in the K and N directions. The sequence numbers start from 0. useK and useN: size of the block to be moved in the K and N directions, in the number of elements. You can calculate the address offset of the upper left corner of a block in the right matrix based on row, col, useK, and useN. tilingPtr: address of the tiling parameter set by SetUserDefInfo dataPtr: computation data address set by SetSelfDefineData |
Restrictions
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | // User-defined callback function void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr); void CopyA1(const LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr); void CopyB1(const LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr); AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<DataCopyOut, CopyA1, CopyB1>> mm; REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); uint64_t tilingPtr = reinterpret_cast<uint64_t>(tiling); mm.SetUserDefInfo(tilingPtr); GlobalTensor<SrcT> dataGM; // Store the GM of the computation data required by the callback function. uint64_t dataGMPtr = reinterpret_cast<uint64_t>(dataGM.address_); mm.SetSelfDefineData(dataGMPtr); mm.SetTensorA(gmA); mm.SetTensorB(gmB); mm.IterateAll(); |