N-direction alignment of the matrix multiplication output

Overview

The matrix multiplication output is aligned in the N direction. That is, matrix C is output in ND_ALIGN format. In Matmul matrix multiplication, the common matrix data formats are ND and NZ. For details, see Data Formats. ND_ALIGN is another data format of the matrix. This format is generally used in matrix multiplication with non-32-byte alignment in the N direction. After the format of matrix C is set to ND_ALIGN, matrix C is output based on the 32-byte alignment padding rule in the N direction, for details, see ND_ALIGN.

The ND_ALIGN output function is described by using Matmul with M=16, K=16, N=14, and the data type of matrix A and matrix B being half as an example. When matrix C is configured to be in ND format and output to the global memory, the output is not 32-byte aligned based on the original N direction, as shown in Figure 1. When the ND format is configured for matrix C, the output is 32-byte aligned in the N direction, as shown in Figure 2. The last two columns in the N direction of matrix C are padded with the actual data in the next row to implement 32-byte alignment in the N direction and output. When matrix C is configured in ND_ALIGN format, the Matmul API fills the last two columns in the N direction of matrix C by adding invalid data to ensure that the N direction is aligned to 32 bytes and output, as shown in Figure 3.

Figure 1 Non-32-byte aligned in the N direction of matrix C in ND format
Figure 2 32-byte alignment in the N direction of matrix C in ND format
Figure 3 32-byte alignment in the N direction of matrix C in ND_ALIGN format

Use Case

During Matmul calculation, the N direction is not 32-byte aligned, but the N direction of the output matrix C requires 32-byte alignment.

Restrictions

If matrix C is output in ND_ALIGN format, the buffer space allocated for matrix C is the space obtained after N is rounded up to 32 bytes.

Examples

For a complete operator example, see matmul_nd_align operator sample.

  • Tiling Implementation

    Call the SetCType API to set the data format of matrix C to CubeFormat::ND_ALIGN. The implementation of other tiling functions is the same as that in the basic scenario.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
    matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
    tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); 
    tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);  
    // Set matrix C, with the buffer location being GM and the data format being ND_ALIGN.
    tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND_ALIGN, matmul_tiling::DataType::DT_FLOAT);
    tiling.SetBiasType(AscendC::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
    ... // Other implementation content
    optiling::TCubeTiling tilingData;   
    int ret = tiling.GetTiling(tilingData);
    
  • Kernel Implementation

    Compared with the basic scenario, the output function of the ND_ALIGN format requires that the data format of the template parameter cType be set to CubeFormat::ND_ALIGN when the Matmul object is created.

    1
    2
    3
    4
    5
    6
    7
    8
    #include "lib/matmul_intf.h"
    
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
    // Set the data format of the template parameter cType to ND_ALIGN.
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND_ALIGN, float> cType; 
    typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
    AscendC::Matmul<aType, bType, cType, biasType> mm;