Operator Implementation

Workflow

The previous section describes the data tiling solution and data flows of Matmul. Ascend C provides a group of Matmul high-level APIs that encapsulate the common algorithm logic for tiling, data movement, and compute, helping you quickly implement Matmul. You can call APIs on the host to automatically obtain tiling parameters. After the parameter is passed to the kernel upon initialization. The matrix multiplication operation can be completed through several simple APIs. For details about the complete example, see here.

Procedure for the host to automatically obtain tiling parameters:

  1. Create a tiling object.
    1
    2
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MatmulApiTiling cubeTiling(ascendcPlatform); 
    

    When creating an object, you need to pass the hardware platform information, which can be obtained by calling GetPlatformInfo.

  2. Set the data types and formats of A, B, and bias.
    1
    2
    3
    4
    cubeTiling.SetAType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);
    cubeTiling.SetBType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);
    cubeTiling.SetCType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
    cubeTiling.SetBiasType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
    
  3. Set the matrix shape.
    1
    2
    cubeTiling.SetShape(M, N, K);
    cubeTiling.SetOrgShape(M, N, K);
    
  4. Set the size of the available space.
    1
    cubeTiling.SetBufferSpace(-1, -1, -1);
    
  5. Set other parameters as required, for example, bias that will participate in the compute.
    1
    cubeTiling.SetBias(true);
    
  6. Obtain tiling parameters.
    1
    2
    3
    4
    MatmulCustomTilingData tiling;
    if (cubeTiling.GetTiling(tiling.cubeTilingData) == -1){ 
        return ge::GRAPH_FAILED;  
    }
    
  7. Perform other operations such as serialization and saving of tiling parameters.

Procedure of using the Matmul API operations on the kernel:

  1. Create a Matmul object.

    The following is an example of creating a Matmul object:

    • In the CUBE_ONLY (with only Cube computation) scenario, you need to set the ASCENDC_CUBE_ONLY code macro. This section uses the CUBE_ONLY mode as an example.
    • By default, the MIX mode (including Cube computation and Vector computation) is used. In this scenario, the ASCENDC_CUBE_ONLY code macro cannot be set. For more information, see Fusion Operator Programming.
    1
    2
    3
    4
    5
    6
    7
    8
    // In CUBE_ONLY, set this code macro before #include "lib/matmul_intf.h".
    #define ASCENDC_CUBE_ONLY 
    #include "lib/matmul_intf.h"
    typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
    typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
    typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
    typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
    matmul::Matmul<aType, bType, cType, biasType> mm; 
    

    During object creation, input the type information of parameters A, B, C, and Bias. The type information is defined by MatmulType, including the logical location of memory, data format, and data type.

  2. Perform initialization.
    1
    REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); // Initialization
    

    The system workspace is required for internal implementation of Matmul high-level APIs. You need to:

    • Set the total workspace size (including the user workspace and system workspace) when implementing tiling on the host. The workspace space is allocated and managed by the framework. The size of the system workspace can be obtained by calling GetLibApiWorkSpaceSize.
      1
      2
      3
      4
      size_t userWorkspaceSize = 0;
      size_t systemWorkspaceSize = static_cast<size_t>(ascendcPlatform.GetLibApiWorkSpaceSize());
      size_t *currentWorkspace = context->GetWorkspaceSizes(1);
      currentWorkspace[0] = userWorkspaceSize + systemWorkspaceSize;
      
    • If the operator project is neither a custom operator project nor a kernel launch operator project with the -DHAVE_WORKSPACE compilation macro, the kernel needs to set the system workspace through SetSysWorkSpace before Matmul initialization.
      1
      2
      3
      4
      5
      // The workspace must be set when Matmul is used.
      SetSysWorkspace(workspace);
      if (GetSysWorkSpacePtr() == nullptr) {
          return;
      }
      
  3. Set the left matrix A, right matrix B, and bias.
    1
    2
    3
    mm.SetTensorA(gm_a);    // Set the left matrix A.
    mm.SetTensorB(gm_b);    // Set the right matrix B.
    mm.SetBias(gm_bias);    // Set the bias.
    
  4. Execute the matrix multiplication.
    • Call Iterate to complete a single iterative computation, and use a while loop to compute the full data on a single core. The Iterate method allows for flexible control over the number of iterations required to compute the desired amount of data.
      1
      2
      3
      while (mm.Iterate()) {   
          mm.GetTensorC(gm_c); 
      }
      
    • Call IterateAll to compute all data on a single core. The IterateAll method does not require cyclic iterations and is relatively simple to use.
      1
      mm.IterateAll(gm_c);
      
  5. End the matrix multiplication.
    1
    mm.End();
    

Setting Shape Information

Shape information can be set during host tiling for tiling compute. Some shape information can also be modified when the kernel is running for scenarios such as tail block setting and Matmul reuse (multiple Matmul computations reuse one Matmul object). This section describes the shape concepts involved and provides guidance on how to set the tiling information on the host and kernel.

  • orgShape: M, N, K
  • singleCoreShape: singleCoreM, singleCoreN, singleCoreK
  • singleShape: singleM, singleN, singleK
  • baseShape: baseM, baseN, baseK

In Data Tiling, we have learned the concepts of orgShape (M, N, and K), singleCoreShape (singleCoreM, singleCoreN, and singleCoreK), and baseShape (baseM, baseN, and baseK), as shown in the following figure.

In addition, during single-core Matmul tiling, the shape that actually participates in Matmul computation can be a part of the original shape. singleM, singleN, and singleK express the shape that actually participates in Matmul computation, as shown in the following figure. In single-core scenarios, singleM, singleN, and singleK are passed through the singleCoreM, singleCoreN, singleCoreK.

  • Kernel runtime settings
    • SetTail and SetSingleShape are used to modify singleCoreM, singleCoreN, and singleCoreK during runtime. SetTail is used to process the tail block. SetSingleShape is used to modify the shapes in the Matmul reuse scenarios (one Matmul object is used by multiple Matmul computations).
    • SetOrgShape is used to modify M, N, and K during runtime. You can also use it to reset shapes in the Matmul reuse scenarios.
  • Single-core tiling settings
    • SetOrgShape (required) is used to set M, N, and K.
    • SetShape (optional) is used to set singleM, singleN, or singleK, which is equivalent to setting singleCoreM, singleCoreN, and singleCoreK.
    • SetFixSplit (optional) is used to set baseM, baseN, and baseK.
  • Multi-core tiling settings
    • SetOrgShape (required) is used to set M, N, and K.
    • SetShape (optional) is used to set singleM, singleN, and singleK.
    • SetFixSplit (optional) is used to set baseM, baseN, and baseK.
    • SetSingleShape (optional) is used to set singleCoreM, singleCoreN, and singleCoreK.
    • SetSingleRange (optional) is used to set the range of singleCoreM, singleCoreN, and singleCoreK.

Setting the Format

During Matmul object creation, input the type information of parameters A, B, C, and Bias. The type information is defined by MatmulType, including the logical location of memory, data format, and data type. The following is an example:

1
2
3
4
5
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
matmul::Matmul<aType, bType, cType, biasType> mm; 

Data formats include CubeFormat::ND, CubeFormat::NZ, and CubeFormat::ND_ALIGN. For details about the ND and NZ formats, see Data Format.

ND_ALIGN is used to configure the matmul result matrix based on certain padding rules. ND to ND_ALIGN conversion is shown in the following figure. The matrix data type is uint32_t. Assume that the result matrix is output to the UB and the N direction of the original matrix is not 32-byte aligned. Pad 0s to align ND_ALIGN to 32 bytes.