MatmulPolicy
Applicability
|
Product |
MatmulPolicy |
TrianUpperMatmulPolicy/TrianLowerMatmulPolicy |
NBuffer33MatmulPolicy |
|---|---|---|---|
|
|
√ |
√ |
√ |
|
|
√ |
√ |
√ |
|
|
√ |
x |
x |
|
|
√ |
x |
x |
|
|
x |
x |
x |
|
|
x |
x |
x |
Function
- MatmulPolicy (default template policy)
Enables the default implementation policy of the Matmul API.
- TrianUpperMatmulPolicy (upper triangular template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. If the base block in the Matmul result matrix C is located in the lower triangular position, the base block is ignored during data computation and data move-out in Matmul. The final matrix C is an upper triangular matrix. The following figure shows the upper triangular template policy. The matrix shape sizes are as follows: M = N = 512, K = 256, baseM = baseN = baseK = 32.
Figure 1 Upper triangular template policy
- TrianLowerMatmulPolicy (lower triangular template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. If the base block in the Matmul result matrix C is located in the upper triangular position, the base block is ignored during data computation and data move-out in Matmul. The final matrix C is a lower triangular matrix. The following figure shows the lower triangular template policy. The matrix shape sizes are as follows: M = N = 512, K = 256, baseM = baseN = baseK = 32.
Figure 2 Lower triangular template policy
- NBuffer33MatmulPolicy (NBuffer33 template policy)
The computation result of a matrix multiplication instruction is a matrix block of baseM × baseN. The matrix block is called a base block. The matrix A computed by a single core is divided into 3 × 3 base blocks. All the 3 × 3 base blocks of matrix A are loaded and stored in L1 Buffer. Each time, matrix multiplication is performed between the 3 × 3 base blocks of matrix A and the 3 × 1 base blocks of matrix B. At the same time, DoubleBuffer loads the 3 × 1 base blocks of matrix B required for the next computation in parallel until the matrix multiplication computation in the singleCoreN direction is complete. The following figure shows the NBuffer33 template policy. In the figure, singleCoreM, singleCoreN, and singleCoreK indicate the shape sizes of matrices A and B in a single core. Matrix A computed in a single core is divided into 3 x 3 base blocks, which are all loaded on L1 Buffer. These base blocks are multiplied by the 3 × 1 base blocks of matrix B each time.
Figure 3 NBuffer33 template policy
Restrictions
- TrianUpperMatmulPolicy supports only the Norm template and MDL template.
- TrianLowerMatmulPolicy supports only the Norm template and MDL template.
- NBuffer33MatmulPolicy:
- Currently, only the MDL template is supported.
- The logical memory positions of matrices A and B support only TPosition::GM.
- The MIX mode (including cube computation and vector computation) is not supported. Only the CUBE_ONLY (including only cube computation) is supported.
- Only the IterateAll API can be used to obtain the computation result matrix C of Matmul.
- The values of stepM, stepKa, and stepKb are less than or equal to 3, and the following condition is met: stepKa = stepKb = ceil(singleCoreK/baseK).
- The sum of the base block size of matrix A (fully loaded) and the base block size of matrix B (loaded) does not exceed the size of L1 Buffer.
- Before call GetTiling to generate tiling parameters, you must call SetMatmulConfigParams to set scheduleTypeIn to ScheduleType::N_BUFFER_33 to enable the tiling generation logic of the NBuffer33 template policy.
Example
The default template policy MatmulPolicy is the default value of the template parameter. The following describes how to use TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.
- Example of using TrianUpperMatmulPolicy
For details about the complete operator sample, see operator sample for using TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#include "lib/matmul_intf.h" typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; // Input TrianUpperMatmulPolicy when defining Matmul. AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::TrianUpperMatmulPolicy> mm; // Perform the regular Matmul computation and output the result in upper triangular format. TPipe pipe; TCubeTiling tiling; REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); mm.Init(&tiling); mm.SetTensorA(gmA, isTransposeA); mm.SetTensorB(gmB, isTransposeB); if (tiling.isBias) { mm.SetBias(gmBias); } mm.IterateAll(gmC);
- Example of using TrianLowerMatmulPolicy
For details about the complete operator sample, see operator sample for using TrianUpperMatmulPolicy and TrianLowerMatmulPolicy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#include "lib/matmul_intf.h" typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; // Input TrianLowerMatmulPolicy when defining Matmul. AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::TrianLowerMatmulPolicy> mm; // Perform the regular Matmul computation and output the result in lower triangular format. TPipe pipe; TCubeTiling tiling; REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); mm.Init(&tiling); mm.SetTensorA(gmA, isTransposeA); mm.SetTensorB(gmB, isTransposeB); if (tiling.isBias) { mm.SetBias(gmBias); } mm.IterateAll(gmC);
- Example of using NBuffer33MatmulPolicy
For details about the complete operator sample, see sample for enabling the NBuffer33 template policy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#include "lib/matmul_intf.h" typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; typedef AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; // Input NBuffer33MatmulPolicy when defining Matmul. AscendC::Matmul<aType, bType, cType, biasType, CFG_NORM, MatmulCallBackFunc<nullptr, nullptr, nullptr>, AscendC::Impl::Detail::NBuffer33MatmulPolicy> mm; // Perform the regular Matmul computation and output the result in lower triangular format. TPipe pipe; TCubeTiling tiling; REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); mm.Init(&tiling); mm.SetTensorA(gmA, isTransposeA); mm.SetTensorB(gmB, isTransposeB); if (tiling.isBias) { mm.SetBias(gmBias); } mm.IterateAll(gmC);