Instructions for Use

Ascend C provides a group of Matmul tiling APIs for users to obtain tiling parameters required for Matmul kernel computation. You only need to input information about the matrices A, B, and C and call the corresponding APIs to obtain related parameters in the TCubeTiling structure in Init.

Matmul tiling APIs are classified into Matmul single-core tiling APIs, multi-core tiling APIs, and BatchMatmul tiling APIs. The process of obtaining tiling parameters is as follows:

  1. Create a single-core tiling object, multi-core tiling object, or BatchMatmul tiling object.
  2. Set the type information of parameters A, B, C, and Bias, as well as the M, N, Ka, and Kb shape information.
  3. Call the GetTiling API to obtain the tiling information.

The following provides examples of using Matmul single-core and multi-core tiling APIs as well as BatchMatmul tiling APIs to obtain tiling parameters:

  • Matmul single-core tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
    tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetShape(1024, 1024, 1024);   
    tiling.SetOrgShape(1024, 1024, 1024); // Ka and Kb can have different lengths, for example, tiling.SetOrgShape(1024, 1024, 1024, 1280).
    tiling.SetBias(true);   
    tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;   
    int ret = tiling.GetTiling(tilingData);    // if ret = -1, get tiling failed
    
  • Multi-core tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MultiCoreMatmulTiling tiling(ascendcPlatform); 
    tiling.SetDim(1);   
    tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    tiling.SetShape(1024, 1024, 1024);   
    tiling.SetSingleShape(1024, 1024, 1024);
    tiling.SetOrgShape(1024, 1024, 1024); 
    tiling.SetBias(true);   
    tiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;   
    int ret = tiling.GetTiling(tilingData);    // if ret = -1, get tiling failed 
    
  • BatchMatmul Tiling
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::BatchMatmulTiling bmmTiling(ascendcPlatform); 
    bmmTiling.SetDim(1);   
    bmmTiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    bmmTiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);   
    bmmTiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    bmmTiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);   
    bmmTiling.SetBias(true);
    bmmTiling.SetShape(1024, 1024, 1024);   
    bmmTiling.SetSingleShape(1024, 1024, 1024);
    bmmTiling.SetOrgShape(1024, 1024, 1024);  
    bmmTiling.SetBufferSpace(-1, -1, -1); // Set the space that can be used. By default, all space of the AI processor is used.
    optiling::TCubeTiling tilingData;
    int ret = tiling.GetTiling(tilingData);    // if ret = -1, get tiling failed
    

The API list is as follows:

Table 1 List of APIs shared by MatmulApiTiling, MultiCoreMatmulTiling, and BatchMatmulTiling

API

Function

SetAType

Sets the position, data format, data type, and transpose status of matrix A.

SetBType

Sets the position, data format, data type, and transpose status of matrix B.

SetCType

Sets the position, data format, and data type of matrix C.

SetDequantType

Sets the dequantization mode.

SetBiasType

Sets the position, data format, and data type of the bias.

SetShape

Sets the shapes singleM, singleN, and singleK of a single Matmul computation. The unit is the number of elements.

SetOrgShape

Sets the original complete shapes M, N, Ka, and Kb during Matmul computation. The unit is the number of elements.

SetALayout

Sets the layout axis information of matrix A.

SetBLayout

Sets the layout axis information of matrix B.

SetCLayout

Sets the layout axis information of matrix C.

SetBatchInfoForNormal

Sets the M, N, and K axes and the batch sizes of matrix A and matrix B.

SetBatchNum

Sets the maximum number of batches for multi-batch computation

EnableBias

Sets whether the bias is used in computation.

SetBias

Sets whether the bias is used in computation.

SetFixSplit

Sets the fixed baseM, baseN, and baseK. The unit is the number of elements.

SetBufferSpace

Sets the size of the available L1/L0C/UB space during Matmul computation. The unit is byte.

SetTraverse

Sets the traversal mode, that is, M axis first or N axis first.

SetMadType

Sets whether to enable the HF32 mode. Not supported in the current version.

SetSplitRange

Sets the maximum and minimum values of baseM, baseN, and baseK.

SetMatmulConfigParams

Customizes the MatmulConfig parameters.

SetDoubleBuffer

Determines whether to enable double buffer for A, B, C, and bias, and whether to enable ND2NZ or NZ2ND conversion. This API is reserved and not supported in the current version.

GetBaseM

Obtains the baseM value.

GetBaseN

Obtains the baseN value.

GetBaseK

Obtains the baseK value.

GetTiling

Obtains tiling parameters.

Table 2 Other MultiCoreMatmulTiling APIs

API

Function

SetDim

Sets the number of cores that can participate in multi-core Matmul computation.

SetSingleRange

Sets the maximum and minimum values of singleCoreM, singleCoreN, and singleCoreK.

SetSingleShape

Sets the shapes singleCoreM, singleCoreN, and singleCoreK of the Matmul single-core computation. The unit is the number of elements.

GetSingleShape

Obtains the computed singleCoreM, singleCoreN, and singleCoreK.

SetAlignSplit

Sets the singleCoreM, singleCoreN, and singleCoreK alignment values during multi-core tiling.

GetCoreNum

Obtains the blockDim used after multi-core tiling.

SetSplitK

Sets K-axis splitting in multi-core tiling. The EnableMultiCoreSplitK API is recommended.

EnableMultiCoreSplitK

Enables K-axis splitting in multi-core tiling.

Table 3 Other BatchMatmulTiling APIs

API

Function

GetCoreNum

Obtains the blockDim used after multi-core tiling.