N-direction alignment of the matrix multiplication output
Overview
The matrix multiplication output is aligned in the N direction. That is, matrix C is output in ND_ALIGN format. In Matmul matrix multiplication, the common matrix data formats are ND and NZ. For details, see Data Formats. ND_ALIGN is another data format of the matrix. This format is generally used in matrix multiplication with non-32-byte alignment in the N direction. After the format of matrix C is set to ND_ALIGN, matrix C is output based on the 32-byte alignment padding rule in the N direction, for details, see ND_ALIGN.
The ND_ALIGN output function is described by using Matmul with M=16, K=16, N=14, and the data type of matrix A and matrix B being half as an example. When matrix C is configured to be in ND format and output to the global memory, the output is not 32-byte aligned based on the original N direction, as shown in Figure 1. When the ND format is configured for matrix C, the output is 32-byte aligned in the N direction, as shown in Figure 2. The last two columns in the N direction of matrix C are padded with the actual data in the next row to implement 32-byte alignment in the N direction and output. When matrix C is configured in ND_ALIGN format, the Matmul API fills the last two columns in the N direction of matrix C by adding invalid data to ensure that the N direction is aligned to 32 bytes and output, as shown in Figure 3.
Use Case
During Matmul calculation, the N direction is not 32-byte aligned, but the N direction of the output matrix C requires 32-byte alignment.
Restrictions
If matrix C is output in ND_ALIGN format, the buffer space allocated for matrix C is the space obtained after N is rounded up to 32 bytes.
Examples
For a complete operator example, see matmul_nd_align operator sample.
- Tiling Implementation
Call the SetCType API to set the data format of matrix C to CubeFormat::ND_ALIGN. The implementation of other tiling functions is the same as that in the basic scenario.
1 2 3 4 5 6 7 8 9 10
auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); // Set matrix C, with the buffer location being GM and the data format being ND_ALIGN. tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND_ALIGN, matmul_tiling::DataType::DT_FLOAT); tiling.SetBiasType(AscendC::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); ... // Other implementation content optiling::TCubeTiling tilingData; int ret = tiling.GetTiling(tilingData);
- Kernel Implementation
Compared with the basic scenario, the output function of the ND_ALIGN format requires that the data format of the template parameter cType be set to CubeFormat::ND_ALIGN when the Matmul object is created.
1 2 3 4 5 6 7 8
#include "lib/matmul_intf.h" typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; // Set the data format of the template parameter cType to ND_ALIGN. typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND_ALIGN, float> cType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; AscendC::Matmul<aType, bType, cType, biasType> mm;


