N-Direction Alignment of Matrix Multiplication Outputs

Overview

N-direction alignment of matrix multiplication outputs means that the matrix multiplication result, which is matrix C, is output in ND_ALIGN format. In Matmul, the common matrix data formats are ND and NZ. For details, see the data formats. ND_ALIGN is another data format of a matrix. This format is generally used in matrix multiplication where the N direction is not 32-byte aligned. After the result matrix C is configured to be in ND_ALIGN format, matrix C is output according to the padding rules for 32-byte alignment in the N direction. For details, see ND_ALIGN.

The ND_ALIGN output function is described by using Matmul with M = 16, K = 16, N = 14, and the data type of matrix A and matrix B being half, as an example. When matrix C is configured to be in ND format and output to the global memory, the original output is not 32-byte aligned in the N direction, as shown in Figure 1. Figure 2 illustrates how the output is 32-byte aligned in the N direction when matrix C is set to be in ND format. Specifically, the last two columns in the N direction of matrix C are padded with the actual data from the next row, which ensures that the data in the N direction is 32-byte aligned and output. When matrix C is set to be in ND_ALIGN format, the Matmul API fills the last two columns in the N direction of matrix C by adding invalid data, which ensure that the data in the N direction is 32-byte aligned and output, as shown in Figure 3.

Figure 1 Non-32-byte alignment of matrix C in ND format in the N direction

Figure 2 32-byte alignment of matrix C in ND format in the N direction

Figure 3 32-byte alignment of matrix C in ND_ALIGN format in the N direction

Application Scenarios

In Matmul computation, the N direction is not 32-byte aligned, but the N direction of the output matrix C is required to be 32-byte aligned.

Restrictions

If matrix C is set to be in ND_ALIGN format, the buffer space allocated for matrix C is the space obtained after the N direction is 32-byte aligned.

Example

For a complete operator example, see matmul_nd_align operator sample.

Tiling implementation

Call the SetCType API to set the data format of matrix C to CubeFormat::ND_ALIGN. Other tiling implementation details are the same as those in basic scenarios.

        
             auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); 
tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16);  
// Set matrix C, with the buffer location being GM and the data format being ND_ALIGN.
tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND_ALIGN, matmul_tiling::DataType::DT_FLOAT);
tiling.SetBiasType(AscendC::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
... // Other implementation details.
optiling::TCubeTiling tilingData;   
int ret = tiling.GetTiling(tilingData);

Kernel implementation

The ND_ALIGN output function requires that the data format of the template parameter cType be set to CubeFormat::ND_ALIGN when a Matmul object is created. This is different from basic scenarios.

        
             #include "lib/matmul_intf.h"

typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
// Set the data format of the template parameter cType to ND_ALIGN.
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND_ALIGN, float> cType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
AscendC::Matmul<aType, bType, cType, biasType> mm;

Parent topic: Feature Scenarios