Operator Implementation

Procedure

The previous section describes the data tiling solution and data flows of Matmul. Ascend C provides a group of Matmul high-level APIs that encapsulate the common algorithm logic for tiling, data movement, and compute, helping you quickly implement Matmul. You can call APIs on the host to automatically obtain tiling parameters. After the parameter is passed to the kernel upon initialization. The matrix multiplication operation can be completed through several simple APIs. For details about the complete example, click here.

Figure 1 Matrix programming process

Procedure for the host to automatically obtain tiling parameters:

  1. Create a tiling object.
    1
    2
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MatmulApiTiling cubeTiling(ascendcPlatform); 
    

    Pass the hardware platform information to create a PlatformAscendC object, and then create a Tiling object. The hardware platform information can be obtained through GetPlatformInfo.

  2. Set the logical memory locations, formats, and data types of A, B, and bias.
    1
    2
    3
    4
    cubeTiling.SetAType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);
    cubeTiling.SetBType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);
    cubeTiling.SetCType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
    cubeTiling.SetBiasType(AscendC::TPosition::GM, CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
    
  3. Set the matrix shape.
    1
    2
    cubeTiling.SetShape(M, N, K);
    cubeTiling.SetOrgShape(M, N, K); // Set the original complete shapes M, N, and K.
    
  4. Set the size of the available space.
    Sets the size of the available L1 Buffer/L0C Buffer/Unified Buffer space for Matmul computation. The value -1 indicates the size of the buffer corresponding to the AI processor.
    1
    cubeTiling.SetBufferSpace(-1, -1, -1);
    
  5. Set other parameters as required, for example, bias that will participate in the compute.
    1
    cubeTiling.EnableBias(true);
    
  6. Obtain tiling parameters.
    1
    2
    3
    4
    MatmulCustomTilingData tiling;
    if (cubeTiling.GetTiling(tiling.cubeTilingData) == -1){ 
        return ge::GRAPH_FAILED;  
    }
    
  7. Perform other operations such as serialization and saving of tiling parameters.

Procedure of using the Matmul API operations on the kernel:

  1. Create a Matmul object.

    The following is an example of creating a Matmul object:

    • In the Cube-only mode (only matrix computation), you are advised to define the ASCENDC_CUBE_ONLY macro in the code to avoid extra performance overhead. This section uses the Cube-only mode as an example.
    • The default mode is MIX (including matrix computation and vector computation). In this scenario, the ASCENDC_CUBE_ONLY macro is not defined. If the ASCENDC_CUBE_ONLY macro is used in the program, the ASCEND_IS_AIC and ASCEND_IS_AIV macros must be used to isolate the Cube and Vector computations. For details, see Fused Operator Programming.
    1
    2
    3
    4
    5
    6
    7
    8
    // In CUBE_ONLY, set this code macro before #include "lib/matmul_intf.h".
    #define ASCENDC_CUBE_ONLY 
    #include "lib/matmul_intf.h"
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
    AscendC::Matmul<aType, bType, cType, biasType> mm; 
    

    During object creation, input the type information of parameters A, B, C, and Bias. The type information is defined by MatmulType, including the logical memory location, data format, and data type.

  2. Perform initialization.
    1
    REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); // Initialization
    

    The system workspace (corresponding to the GetSysWorkSpacePtr API in this step) is required for internal implementation of Matmul high-level APIs. You need to allocate the system workspace space.

    • Set the total workspace size (including the user workspace and system workspace) when implementing tiling on the host. The workspace space is allocated and managed by the framework. The size of the system workspace can be obtained by calling GetLibApiWorkSpaceSize.
      1
      2
      3
      4
      size_t userWorkspaceSize = 0;
      size_t systemWorkspaceSize = static_cast<size_t>(ascendcPlatform.GetLibApiWorkSpaceSize());
      size_t *currentWorkspace = context->GetWorkspaceSizes(1);
      currentWorkspace[0] = userWorkspaceSize + systemWorkspaceSize;
      
    • If the operator project is neither a custom operator project nor a Kernel debugging project with the HAVE_WORKSPACE compilation macro, the framework does not automatically set the workspace. In this case, you need to set the system workspace by using SetSysWorkSpace before the Matmul initialization on the kernel side.
      1
      2
      3
      4
      5
      // The workspace space must be set when Matmul is used.
      SetSysWorkspace(workspace);
      if (GetSysWorkSpacePtr() == nullptr) {
          return;
      }
      
  3. Set the left matrix A, right matrix B, and bias.
    1
    2
    3
    mm.SetTensorA(gm_a);    // Set the left matrix A.
    mm.SetTensorB(gm_b);    // Set the right matrix B.
    mm.SetBias(gm_bias);    // Set the bias.
    
  4. Execute the matrix multiplication.
    • Call Iterate to complete a single iterative computation, and use a while loop to compute the full data on a single core. The Iterate method allows for flexible control over the number of iterations required to compute the desired amount of data.
      1
      2
      3
      while (mm.Iterate()) {   
          mm.GetTensorC(gm_c); 
      }
      
    • Call IterateAll to compute all data on a single core. The IterateAll method does not require cyclic iterations and is relatively simple to use.
      1
      mm.IterateAll(gm_c);
      
  5. End the matrix multiplication.
    1
    mm.End();
    

Setting Shape Information

Shape information can be set during tiling implementation on the host for tiling compute. Some shape information can also be modified when the kernel is running for scenarios such as tail block setting and Matmul reuse (multiple Matmul computations reuse one Matmul object). This section describes the shape concepts involved and provides guidance on how to set the tiling information on the host and kernel.

  • orgShape: M, N, K
  • singleCoreShape: singleCoreM, singleCoreN, singleCoreK
  • singleShape: singleM, singleN, singleK
  • baseShape: baseM, baseN, baseK

In Data Tiling, we have learned the concepts of orgShape (M, N, and K), singleCoreShape (singleCoreM, singleCoreN, and singleCoreK), and baseShape (baseM, baseN, and baseK), as shown in the following figure.

In addition, during single-core Matmul tiling, the shape that actually participates in Matmul computation can be a part of the original shape. singleM, singleN, and singleK express the shape that actually participates in Matmul computation, as shown in the following figure. In single-core scenarios, singleM, singleN, and singleK are passed through singleCoreM, singleCoreN, and singleCoreK.

  • Kernel runtime settings
    • SetTail and SetSingleShape are used to modify singleCoreM, singleCoreN, and singleCoreK during runtime. SetTail is used to process the tail block. SetSingleShape is used to modify the shapes in the Matmul reuse scenarios (one Matmul object is used by multiple Matmul computations).
    • SetOrgShape is used to modify M, N, and K during runtime. You can also use it to reset shapes in the Matmul reuse scenarios.
  • Single-core tiling settings
    • SetOrgShape (required) is used to set M, N, and K.
    • SetShape (optional) is used to set singleM, singleN, or singleK, which is equivalent to setting singleCoreM, singleCoreN, and singleCoreK.
    • SetFixSplit (optional) is used to set baseM, baseN, and baseK.
  • Multi-core tiling settings
    • SetOrgShape (required) is used to set M, N, and K.
    • SetShape (optional) is used to set singleM, singleN, and singleK.
    • SetFixSplit (optional) is used to set baseM, baseN, and baseK.
    • SetSingleShape (optional) is used to set singleCoreM, singleCoreN, and singleCoreK.
    • SetSingleRange (optional) is used to set the range of singleCoreM, singleCoreN, and singleCoreK.

Setting the Format

During Matmul object creation, input the type information of parameters A, B, C, and Bias. The type information is defined by MatmulType, including the logical memory location, data format, and data type. The following is an example:

1
2
3
4
5
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
AscendC::Matmul<aType, bType, cType, biasType> mm; 

The data formats include CubeFormat::ND, CubeFormat::NZ, and CubeFormat::ND_ALIGN. For details about the ND and NZ formats, see Data format.. For details about the ND_ALIGN format, see Data Layout Formats.