Multi-core non-aligned splitting
Overview
In the multi-core scenario, when the matrix is split, if M, N, and K cannot be exactly divided by singleCoreM, singleCoreN, and singleCoreK, tail blocks occur. This is the multi-core non-alignment scenario. The following figure shows the matrix blocks in the last row and last column of matrices A and B.

In this case, the R matrix block in matrix C is still obtained by accumulating A1*B1+A2*B2+A3*B3+A4*B4. When processing tail blocks such as A1*B1, A2*B2, A3*B3 and A4 x B4, you need to set the tail block size on the kernel side. Call the SetTail API to reset the singleCoreM/singleCoreN/singleCoreK of the current calculation without changing the original tiling, during tail block processing, data is transferred and calculated based on the configured values, that is, tailM, tailN, and tailK.
Use Case
Tail blocks exist during Matmul matrix calculation in multi-core processing.
Restrictions
The SetTail interface invoked to process the tail block must be invoked before Iterate/IterateAll.
Examples
For details about the complete example of the Matmul multi-core non-alignment scenario, see Matmul multi-core non-aligned split operator sample. The key code example in this scenario is as follows:
1 2 3 4 5 6 7 8 9 |
// Process the tail block. int tailM = tiling.M - mCoreIndex * tiling.singleCoreM; tailM = tailM < tiling.singleCoreM ? tailM : tiling.singleCoreM; int tailN = tiling.N - nCoreIndex * tiling.singleCoreN; tailN = tailN < tiling.singleCoreN ? tailN : tiling.singleCoreN; // When tailM < singleCoreM or tailN < singleCoreN, the tail block needs to be processed. In this case, you can call the SetTail interface to set the tail block. if (tailM < tiling.singleCoreM || tailN < tiling.singleCoreN) { matmulObj.SetTail(tailM, tailN); } |