Matmul Tiling Usage Instructions
Ascend C provides a group of Matmul Tiling APIs for users to obtain tiling parameters required for Matmul kernel computation. You only need to input information about the matrices A, B, and C, such as the position, format, and data type, and call the corresponding APIs to obtain related parameters in the TCubeTiling structure in Init.
Matmul tiling APIs are classified into Matmul single-core tiling APIs, multi-core tiling APIs, and BatchMatmul tiling APIs, which are used for Matmul single-core computing, multi-core computing, and BatchMatmul computing respectively. The process of obtaining tiling parameters is as follows:
- Create a single-core tiling object, multi-core tiling object, or BatchMatmul tiling object.
- Set the type information of parameters A, B, C, and Bias, as well as the M, N, Ka, and Kb shape information.
- Call the GetTiling API to obtain the tiling information.
The following provides examples of using Matmul single-core and multi-core tiling APIs and BatchMatmul tiling APIs to obtain tiling parameters:
- Matmul single-core tiling
1 2 3 4 5 6 7 8 9 10 11 12 13
auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); // Set the position, format, and data type of the matrices A, B, and C, and the bias. tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); tiling.SetShape(1024, 1024, 1024); // Set the values of M, N, and K for single-core computing. tiling.SetOrgShape(1024, 1024, 1024); // Set the original input values of M, N, and K, which are the same for single-core tiling and SetShape. If Ka and Kb are not of the same length, set tiling.SetOrgShape(1024, 1024, 1024, 1280). tiling.EnableBias(true); // Set the Matmul computing to include the bias. tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used. optiling::TCubeTiling tilingData; int64_t ret = tiling.GetTiling(tilingData); // if ret = -1, get tiling failed
- Matmul multi-core tiling
1 2 3 4 5 6 7 8 9 10 11 12 13 14
auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); matmul_tiling::MultiCoreMatmulTiling tiling(ascendcPlatform); tiling.SetDim(1); // Set the number of cores for computing to 1. tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); tiling.SetShape(1024, 1024, 1024); tiling.SetSingleShape(1024, 1024, 1024); tiling.SetOrgShape(1024, 1024, 1024); tiling.EnableBias(true); tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used. optiling::TCubeTiling tilingData; int64_t ret = tiling.GetTiling(tilingData); // if ret = -1, get tiling failed
- BatchMatmul Tiling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); matmul_tiling::BatchMatmulTiling bmmTiling(ascendcPlatform); bmmTiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); bmmTiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); bmmTiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); bmmTiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); bmmTiling.EnableBias(true); bmmTiling.SetShape(64, 48, 32); bmmTiling.SetSingleShape(64, 48, 32); bmmTiling.SetOrgShape(64, 48, 32); // When the layout type is NORMAL, use SetBatchInfoForNormal to set the layout axis information of matrices A, B, and C. bmmTiling.SetBatchInfoForNormal(2, 2, 64, 48, 32); // When the layout type is BSNGD, SBNGD, or BNGS1S2, use SetALayout, SetBLayout, and SetCLayout to set the layout axis information of matrices A, B, and C. // bmmTiling.SetALayout(3, 64, 2, 2, 32); // bmmTiling.SetBLayout(3, 32, 2, 2, 48); // bmmTiling.SetCLayout(3, 64, 2, 2, 48); bmmTiling.SetBatchNum(2); bmmTiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used. optiling::TCubeTiling tilingData; int64_t ret = bmmTiling.GetTiling(tilingData); // if ret = -1, get tiling failed
The API list is as follows:
|
API |
Function |
|---|---|
|
SetAType |
Sets the position, data format, data type, and transpose status of matrix A. |
|
SetBType |
Sets the position, data format, data type, and transpose status of matrix B. |
|
SetCType |
Sets the position, data format, and data type of matrix C. |
|
SetDequantType |
Sets the dequantization mode. |
|
SetBiasType |
Sets the position, data format, and data type of the bias. |
|
SetShape |
Sets the shapes singleM, singleN, and singleK of a single Matmul computation. The unit is the number of elements. |
|
SetOrgShape |
Sets the original complete shapes M, N, Ka, and Kb during Matmul computation. The unit is the number of elements. |
|
SetALayout |
Sets the layout axis information of matrix A. |
|
SetBLayout |
Sets the layout axis information of matrix B. |
|
SetCLayout |
Sets the layout axis information of matrix C. |
|
SetBatchInfoForNormal |
Sets the M, N, and K axes and the batch sizes of matrix A and matrix B. |
|
SetBatchNum |
Sets the maximum number of batches for multi-batch computation. |
|
EnableBias |
Sets whether the bias is used in computation. |
|
SetBias |
Sets whether the bias is used in computation. You are advised to use the EnableBias API. |
|
SetFixSplit |
Sets the fixed baseM, baseN, and baseK. The unit is the number of elements. |
|
SetBufferSpace |
Sets the size of the available L1/L0C/UB space during Matmul computation. The unit is byte. |
|
SetTraverse |
Sets the traversal mode, that is, M axis first or N axis first. |
|
SetMadType |
Sets whether to enable the HF32 mode. Not supported in the current version. |
|
SetSplitRange |
Sets the maximum and minimum values of baseM, baseN, and baseK. |
|
SetMatmulConfigParams |
Customizes the MatmulConfig parameters. |
|
SetDoubleBuffer |
Determines whether to enable double buffer for A, B, C, and bias, and whether to enable ND2NZ or NZ2ND conversion. This API is reserved and not supported in the current version. |
|
GetBaseM |
Obtains the baseM value. |
|
GetBaseN |
Obtains the baseN value. |
|
GetBaseK |
Obtains the baseK value. |
|
GetTiling |
Obtains tiling parameters. |
|
API |
Function |
|---|---|
|
SetDim |
Sets the number of cores that can participate in multi-core Matmul computation. |
|
SetSingleRange |
Sets the maximum and minimum values of singleCoreM, singleCoreN, and singleCoreK. The unit is the number of elements. |
|
SetSingleShape |
Sets the shapes singleCoreM, singleCoreN, and singleCoreK of the Matmul single-core computation. The unit is the number of elements. |
|
GetSingleShape |
Obtains the computed singleCoreM, singleCoreN, and singleCoreK. |
|
SetAlignSplit |
Sets the singleCoreM, singleCoreN, and singleCoreK alignment values during multi-core tiling. |
|
GetCoreNum |
Obtains the blockDim used after multi-core tiling. |
|
SetSplitK |
Enables K-axis splitting in multi-core tiling. The EnableMultiCoreSplitK API is recommended. |
|
EnableMultiCoreSplitK |
Enables K-axis splitting in multi-core tiling. |
|
API |
Function |
|---|---|
|
GetCoreNum |
Obtains the blockDim used after multi-core tiling. |
Header File to Be Included
- Matmul single-core tiling
1#include "lib/matmul/matmul_tiling.h"
- Matmul multi-core tiling
1#include "lib/matmul/bmm_tiling.h"
- BatchMatmul Tiling
1#include "lib/matmul/bmm_tiling.h"