Matmul Tiling Usage Instructions

Ascend C provides a group of Matmul Tiling APIs for users to obtain tiling parameters required for Matmul kernel computation. You only need to input information about the matrices A, B, and C, such as the position, format, and data type, and call the corresponding APIs to obtain related parameters in the TCubeTiling structure in Init.

Matmul tiling APIs are classified into Matmul single-core tiling APIs, multi-core tiling APIs, and BatchMatmul tiling APIs, which are used for Matmul single-core computing, multi-core computing, and BatchMatmul computing respectively. The process of obtaining tiling parameters is as follows:

  1. Create a single-core tiling object, multi-core tiling object, or BatchMatmul tiling object.
  2. Set the type information of parameters A, B, C, and Bias, as well as the M, N, Ka, and Kb shape information.
  3. Call the GetTiling API to obtain the tiling information.

The following provides examples of using Matmul single-core and multi-core tiling APIs and BatchMatmul tiling APIs to obtain tiling parameters:

  • Matmul single-core tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
    // Set the position, format, and data type of the matrices A, B, and C, and the bias.
    tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);
    tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetShape(1024, 1024, 1024); // Set the values of M, N, and K for single-core computing.
    tiling.SetOrgShape(1024, 1024, 1024); // Set the original input values of M, N, and K, which are the same for single-core tiling and SetShape. If Ka and Kb are not of the same length, set tiling.SetOrgShape(1024, 1024, 1024, 1280).
    tiling.EnableBias(true); // Set the Matmul computing to include the bias.
    tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;   
    int64_t ret = tiling.GetTiling(tilingData);    // if ret = -1, get tiling failed
    
  • Matmul multi-core tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MultiCoreMatmulTiling tiling(ascendcPlatform); 
    tiling.SetDim(1); // Set the number of cores for computing to 1.
    tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetShape(1024, 1024, 1024);   
    tiling.SetSingleShape(1024, 1024, 1024);
    tiling.SetOrgShape(1024, 1024, 1024); 
    tiling.EnableBias(true);   
    tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;   
    int64_t ret = tiling.GetTiling(tilingData);    // if ret = -1, get tiling failed 
    
  • BatchMatmul Tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::BatchMatmulTiling bmmTiling(ascendcPlatform); 
      
    bmmTiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    bmmTiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    bmmTiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    bmmTiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    bmmTiling.EnableBias(true);
    bmmTiling.SetShape(64, 48, 32);   
    bmmTiling.SetSingleShape(64, 48, 32);
    bmmTiling.SetOrgShape(64, 48, 32); 
    // When the layout type is NORMAL, use SetBatchInfoForNormal to set the layout axis information of matrices A, B, and C.
    bmmTiling.SetBatchInfoForNormal(2, 2, 64, 48, 32);
    // When the layout type is BSNGD, SBNGD, or BNGS1S2, use SetALayout, SetBLayout, and SetCLayout to set the layout axis information of matrices A, B, and C.
    // bmmTiling.SetALayout(3, 64, 2, 2, 32);
    // bmmTiling.SetBLayout(3, 32, 2, 2, 48);
    // bmmTiling.SetCLayout(3, 64, 2, 2, 48);
    bmmTiling.SetBatchNum(2);
    bmmTiling.SetBufferSpace(-1, -1, -1);  // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;
    int64_t ret = bmmTiling.GetTiling(tilingData);    // if ret = -1, get tiling failed
    

The API list is as follows:

Table 1 List of APIs shared by MatmulApiTiling, MultiCoreMatmulTiling, and BatchMatmulTiling

API

Function

SetAType

Sets the position, data format, data type, and transpose status of matrix A.

SetBType

Sets the position, data format, data type, and transpose status of matrix B.

SetCType

Sets the position, data format, and data type of matrix C.

SetDequantType

Sets the dequantization mode.

SetBiasType

Sets the position, data format, and data type of the bias.

SetShape

Sets the shapes singleM, singleN, and singleK of a single Matmul computation. The unit is the number of elements.

SetOrgShape

Sets the original complete shapes M, N, Ka, and Kb during Matmul computation. The unit is the number of elements.

SetALayout

Sets the layout axis information of matrix A.

SetBLayout

Sets the layout axis information of matrix B.

SetCLayout

Sets the layout axis information of matrix C.

SetBatchInfoForNormal

Sets the M, N, and K axes and the batch sizes of matrix A and matrix B.

SetBatchNum

Sets the maximum number of batches for multi-batch computation.

EnableBias

Sets whether the bias is used in computation.

SetBias

Sets whether the bias is used in computation. You are advised to use the EnableBias API.

SetFixSplit

Sets the fixed baseM, baseN, and baseK. The unit is the number of elements.

SetBufferSpace

Sets the size of the available L1/L0C/UB space during Matmul computation. The unit is byte.

SetTraverse

Sets the traversal mode, that is, M axis first or N axis first.

SetMadType

Sets whether to enable the HF32 mode. Not supported in the current version.

SetSplitRange

Sets the maximum and minimum values of baseM, baseN, and baseK.

SetMatmulConfigParams

Customizes the MatmulConfig parameters.

SetDoubleBuffer

Determines whether to enable double buffer for A, B, C, and bias, and whether to enable ND2NZ or NZ2ND conversion. This API is reserved and not supported in the current version.

GetBaseM

Obtains the baseM value.

GetBaseN

Obtains the baseN value.

GetBaseK

Obtains the baseK value.

GetTiling

Obtains tiling parameters.

Table 2 Other MultiCoreMatmulTiling APIs

API

Function

SetDim

Sets the number of cores that can participate in multi-core Matmul computation.

SetSingleRange

Sets the maximum and minimum values of singleCoreM, singleCoreN, and singleCoreK. The unit is the number of elements.

SetSingleShape

Sets the shapes singleCoreM, singleCoreN, and singleCoreK of the Matmul single-core computation. The unit is the number of elements.

GetSingleShape

Obtains the computed singleCoreM, singleCoreN, and singleCoreK.

SetAlignSplit

Sets the singleCoreM, singleCoreN, and singleCoreK alignment values during multi-core tiling.

GetCoreNum

Obtains the blockDim used after multi-core tiling.

SetSplitK

Enables K-axis splitting in multi-core tiling. The EnableMultiCoreSplitK API is recommended.

EnableMultiCoreSplitK

Enables K-axis splitting in multi-core tiling.

Table 3 Other BatchMatmulTiling APIs

API

Function

GetCoreNum

Obtains the blockDim used after multi-core tiling.

Header File to Be Included

  • Matmul single-core tiling
    1
    #include "lib/matmul/matmul_tiling.h"
    
  • Matmul multi-core tiling
    1
    #include "lib/matmul/bmm_tiling.h"
    
  • BatchMatmul Tiling
    1
    #include "lib/matmul/bmm_tiling.h"