Matmul Template Parameters
Function Usage
The following parameters need to be passed to create a Matmul object:
- Parameter types of matrices A, B, and C, and the bias. The type information is defined by MatmulType, including the logical memory location, data format, data type, whether to transpose, data layout, and whether to enable L1 reuse.
- (Optional) MatmulConfig information, which is used to configure Matmul template information and related parameters. If it is not set, the Norm template is enabled by default.
- (Optional) MatmulCallBack custom function information, which is used to configure a function for copying the left and right matrices from GM to L1 and copying the computation result from CO1 to GM. Supported products:
- (Optional) MatmulPolicy information. This parameter is reserved.
Principles
The computation process is as follows:
- Migrate data from GM to A1: DataCopy migrates a stepM*baseM*stepKa*baseK matrix block a1 from matrix A each time until matrix A is completed migrated. Then, migrate data from GM to B1: DataCopy migrates a stepKb*baseK*stepN*baseN matrix block b1 from matrix B each time until matrix B is completed migrated.
- Migrate data from A1 to A2: LoadData migrates a baseM * baseK matrix block a0 from a1 each time. Data is moved from B1 to B2 for transposing. Then, LoadData moves a baseK * baseN matrix block from b1 each time, and transposes the matrix block into a baseN * baseK matrix block b0.
- Perform matrix multiplication: Each time computation of one matrix block a0 × b0 is completed, a matrix block co1 of baseM * baseN is obtained.
- Migrate data from matrix block co1 to co2: DataCopy migrates one baseM * baseN matrix block co1 to matrix block co2 of singleCoreM * singleCoreN each time.
- Repeat steps 2 to 4 to compute matrix block a1 × b1.
- Migrate data from matrix block co2 to matrix block C: DataCopy migrates one matrix block co2 of singleCoreM * singleCoreN to matrix block C each time.
- Repeat steps 1 to 6 to complete the computation: matrix A × B = C.
Prototype
1 2 |
template <class A_TYPE, class B_TYPE, class C_TYPE, class BIAS_TYPE = C_TYPE, const MatmulConfig& MM_CFG = CFG_NORM, class MM_CB = MatmulCallBackFunc<nullptr, nullptr, nullptr>, MATMUL_POLICY_VARIADIC_TEMPLATE_OF(MATMUL_POLICY)> using Matmul = matmul::MatmulImpl<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG, MM_CB, MATMUL_POLICY...>; |
- A_TYPE, B_TYPE, and C_TYPE are defined by MatmulType.
- (Optional) MatmulConfig: Matmul template information. For details, see MatmulConfig.
- (Optional) MATMUL_POLICY_VARIADIC_TEMPLATE_OF(MatmulPolicy). This parameter is reserved and does not need to be configured.
1 2
#define MATMUL_POLICY_VARIADIC_TEMPLATE_OF(NAME) \ template <const auto& = MM_CFG, typename ...> class ...NAME
Parameters
|
Parameter |
Description |
|---|---|
|
POSITION |
Logical position of memory |
|
CubeFormat |
|
|
TYPE |
Note: The data types of matrix A and matrix B must be the same. For details about data type combinations, see Table 2. |
|
ISTRANS |
Whether to enable the matrix transpose function.
The default value is false, indicating that the transpose function is disabled. |
|
LAYOUT |
Data layout format. NONE (default): BatchMatmul is not used. Other options indicate that BatchMatmul is used. NORMAL: BMNK data layout mode. BSNGD: data layout after reshaping is performed on the original BSH shape. For details, see the description of data layout in IterateBatch. SBNGD: data layout after reshaping is performed on the original SBH shape. For details, see the description of data layout in IterateBatch. BNGS1S2: matrix multiplication output of the first two data layouts. S1S2 data is stored continuously, and an S1S2 element is the data computed of a batch. |
|
IBSHARE |
Whether to enable IBShare. IBShare can reuse the same matrix A or matrix B data on L1 Buffer. When IBShare is enabled for both matrix A and matrix B, matrix A and matrix B on L1 Buffer are reused at the same time. In this case, only the Norm template is supported. (For details about how to use parameters in this scenario, see matmulABshare sample.) Note that the following conditions must be met when IBShare is enabled for both matrix A and matrix B:
This parameter is used together with the IBShare template except in the scenario where matrices A and B are reused at the same time. To use the IBShare template, the reused matrix must be fully loaded on the L1 Buffer. For details about parameter settings, see Table 2. |
Returns
None
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// User-defined callback function void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr); void CopyA1(const AscendC::LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr); void CopyB1(const AscendC::LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr); typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; matmul::Matmul<aType, bType, cType, biasType, CFG_MDL> mm1; matmul::MatmulConfig mmConfig{false, true, false, 128, 128, 64}; mmConfig.enUnitFlag = false; matmul::Matmul<aType, bType, cType, biasType, mmConfig> mm2; matmul::Matmul<aType, bType, cType, biasType, CFG_NORM, matmul::MatmulCallBackFunc<DataCopyOut, CopyA1, CopyB1>> mm3; |