MatmulPolicy

Applicability

Product	MatmulPolicy	TrianUpperMatmulPolicy/TrianLowerMatmulPolicy	NBuffer33MatmulPolicy
Atlas A3 training products / Atlas A3 inference products	√	√	√
Atlas A2 training products / Atlas A2 inference products	√	√	√
Atlas 200I/500 A2 inference products	√	x	x
Atlas inference product 's AI Core	√	x	x
Atlas inference product 's Vector Core	x	x	x
Atlas training products	x	x	x

Function

The template parameter MatmulPolicy is used to define the policy of the Matmul extensible module. Currently, the following Matmul built-in template policies are supported:

MatmulPolicy (default template policy)
Enables the default implementation policy of the Matmul API.
TrianUpperMatmulPolicy (upper triangular template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. If the base block in the Matmul result matrix C is located in the lower triangular position, the base block is ignored during data computation and data move-out in Matmul. The final matrix C is an upper triangular matrix. The following figure shows the upper triangular template policy. The matrix shape sizes are as follows: M = N = 512, K = 256, baseM = baseN = baseK = 32.

Figure 1 Upper triangular template policy
TrianLowerMatmulPolicy (lower triangular template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. If the base block in the Matmul result matrix C is located in the upper triangular position, the base block is ignored during data computation and data move-out in Matmul. The final matrix C is a lower triangular matrix. The following figure shows the lower triangular template policy. The matrix shape sizes are as follows: M = N = 512, K = 256, baseM = baseN = baseK = 32.

Figure 2 Lower triangular template policy

NBuffer33MatmulPolicy (NBuffer33 template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. The matrix A computed by a single core is divided into 3 × 3 base blocks. All the 3 × 3 base blocks of matrix A are loaded and stored in L1 Buffer. Each time, matrix multiplication is performed between the 3 × 3 base blocks of matrix A and the 3 × 1 base blocks of matrix B. At the same time, DoubleBuffer loads the 3 × 1 base blocks of matrix B required for the next computation in parallel until the matrix multiplication computation in the singleCoreN direction is complete. The following figure shows the NBuffer33 template policy. In the figure, singleCoreM, singleCoreN, and singleCoreK indicate the shape sizes of matrices A and B in a single core. Matrix A computed in a single core is divided into 3 × 3 base blocks, which are all loaded into L1 Buffer. These base blocks are multiplied by the 3 × 1 base blocks of matrix B each time.

Figure 3 NBuffer33 template policy

Restrictions

TrianUpperMatmulPolicy supports only the Norm template and MDL template.
TrianLowerMatmulPolicy supports only the Norm template and MDL template.
NBuffer33MatmulPolicy:
- Currently, only the MDL template is supported.
- The logical memory positions of matrices A and B support only TPosition::GM.
- The MIX mode (including cube computation and vector computation) is not supported. Only the CUBE_ONLY (including only cube computation) is supported.
- Only the IterateAll API can be used to obtain the computation result matrix C of Matmul.
- The values of stepM, stepKa, and stepKb are less than or equal to 3, and the following condition is met: stepKa = stepKb = ceil(singleCoreK/baseK).
- The sum of the base block size of matrix A (fully loaded) and the base block size of matrix B (loaded) does not exceed the size of L1 Buffer.
- Before calling GetTiling to generate tiling parameters, you must call SetMatmulConfigParams to set scheduleTypeIn to ScheduleType::N_BUFFER_33 to enable the tiling generation logic of the NBuffer33 template policy.

Example

The default template policy MatmulPolicy is the default value of the template parameter. The following describes how to use TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.

Example of using TrianUpperMatmulPolicy

For details about the complete operator sample, see operator sample for using TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.

         
              #include "lib/matmul_intf.h"

typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType;
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType;
// Input TrianUpperMatmulPolicy when defining Matmul.
AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::TrianUpperMatmulPolicy> mm; 

// Perform the regular Matmul computation and output the result in upper triangular format.
TPipe pipe;
TCubeTiling tiling;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.Init(&tiling);
mm.SetTensorA(gmA, isTransposeA);
mm.SetTensorB(gmB, isTransposeB);
if (tiling.isBias) {
    mm.SetBias(gmBias);
}
mm.IterateAll(gmC);

Example of using TrianLowerMatmulPolicy

For details about the complete operator sample, see operator sample for using TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.

         
              #include "lib/matmul_intf.h"

typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType;
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType;
// Input TrianLowerMatmulPolicy when defining Matmul.
AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::TrianLowerMatmulPolicy> mm; 

// Perform the regular Matmul computation and output the result in lower triangular format.
TPipe pipe;
TCubeTiling tiling;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.Init(&tiling);
mm.SetTensorA(gmA, isTransposeA);
mm.SetTensorB(gmB, isTransposeB);
if (tiling.isBias) {
    mm.SetBias(gmBias);
}
mm.IterateAll(gmC);

Example of using NBuffer33MatmulPolicy

For details about the complete operator sample, see sample for enabling the NBuffer33 template policy.

         
              #include "lib/matmul_intf.h"

typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType;
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType;
// Input NBuffer33MatmulPolicy when defining Matmul.

AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::NBuffer33MatmulPolicy> mm; 

// Perform the regular Matmul computation and output the result in lower triangular format.
TPipe pipe;
TCubeTiling tiling;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.Init(&tiling);
mm.SetTensorA(gmA, isTransposeA);
mm.SetTensorB(gmB, isTransposeB);
if (tiling.isBias) {
    mm.SetBias(gmBias);
}
mm.IterateAll(gmC);

Parent topic: Matmul Kernel APIs