Multi-check and split
Overview
To implement multi-core parallelism for higher efficiency, matrix data needs to be tiled and allocated to different cores for processing. There are two main tiling policies: K-axis tiling and non-K-axis tiling.
The policies for tiling only the M and N axes but not the K axis are as follows:
- Matrix A is tiled into multiple tiles of singleCoreM along the M axis. A single core processes singleCoreM x K data.
- Matrix B is tiled into multiple tiles of singleCoreN along the N axis. A single core processes K x singleCoreN data.
- For matrix C, matrix A with the size of singleCoreM x K is multiplied by matrix B with the size of K x singleCoreN to obtain matrix C with the size of singleCoreM x singleCoreN, the size of matrix C output on a single core.
As shown in the following figure, eight cores participate in the compute. Matrix A is tiled into four blocks along the M axis, and matrix B is tiled into two blocks along the N axis. A single core processes only one block (for example, the green part in the figure is the data computed on core5). The matrix A block with the size of singleCoreM x K is multiplied by the matrix B block with the size of singleCoreN x K to obtain the matrix C block with the size of singleCoreM x singleCoreN.

The following figure shows the strategies of tiling the M, N, and K axes.
- Matrix A is tiled into multiple tiles of singleCoreM along the M axis and multiple tiles of singleCoreK along the K axis. A single core processes data of the size of singleCoreM x singleCoreK.
- Matrix B is tiled into multiple tiles of singleCoreK along the K axis and into multiple tiles of singleCoreN along the N axis. A single core processes data of the size of singleCoreK x singleCoreN.
- For matrix C, matrix A with the size of singleCoreM x singleCoreK is multiplied by matrix B with the size of singleCoreK x singleCoreN, and accumulation is performed to obtain matrix C blocks with the size of singleCoreM x singleCoreN.
As shown in the following figure, matrix R blocks in matrix C are obtained by accumulating A1 x B1 + A2 x B2 + A3 x B3, where A1 x B1, A2 x B2, and A3 x B3 can be computed in parallel on multiple cores.

- In the CUBE_ONLY (with only Cube computation) scenario, the CUBE_ONLY mode is used as an example.
SetDim is used to set the number of available cores of the current AI processor. The Tiling computation is used to obtain the number of cores actually used for Matmul computation. The number of cores actually used is less than or equal to the number of available cores of the AI processor. SetBlockDim is configured by the user according to the actual number of used cores.
- For details about the rules for setting the MIX mode (including cube computation and vector computation), see Rules for Setting the Number of Cores in the MIX Scenario.
Use Case
The Matmul matrix computation scenario with multiple cores.
Restrictions
None
Examples
The following is the key code example in this scenario: For details about the complete samples of Matmul multi-core alignment, see the following samples: matmul multi-core kernel launch sample (M and N tiling in multi-core mode) and operator sample for multi-core K splitting (K tiling in multi-core mode).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// Construct a multi-core tiling object. auto ascendcPlatform = platform_ascendc::PlatformAscendCManager::GetInstance(socVersion); matmul_tiling::MultiCoreMatmulTiling cubeTiling(*ascendcPlatform); // For operators that involve only Cube computation, set the number of cores that can participate in matrix multiplication to the number of Cube cores on the current AI processor. cubeTiling.SetDim(ascendcPlatform.GetCoreNumAic()); cubeTiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); cubeTiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); cubeTiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); cubeTiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); cubeTiling.SetOrgShape(M, N, K); cubeTiling.SetShape(M, N, K); cubeTiling.EnableBias(isBias); optiling::TCubeTiling tilingData; // Obtain tiling parameters. int ret = cubeTiling.GetTiling(tilingData); // if ret = -1, gen tiling failed |