Matmul Template Parameters

Function Usage

The following parameters need to be passed to create a Matmul object:

  • Parameter types of matrices A, B, and C, and the bias. The type information is defined by MatmulType, including the logical memory location, data format, data type, whether to transpose, data layout, and whether to enable L1 reuse.
  • (Optional) MatmulConfig information, which is used to configure Matmul template information and related parameters. If it is not set, the Norm template is enabled by default.
  • (Optional) MatmulCallBack custom function information, which is used to configure a function for copying the left and right matrices from GM to L1 and copying the computation result from CO1 to GM. Supported products:
  • (Optional) MatmulPolicy information. This parameter is reserved.

Principles

Take the input matrix A (GM, ND, half), matrix B (GM, ND, half), and output matrix C (GM, ND, float), with bias not supported, as an example. (GM, ND, half) indicates that data is stored on the GM, the data format is ND, and the data type is half. The following figure shows the internal algorithm of the high-level Matmul APIs.
Figure 1 Matmul algorithm

The computation process is as follows:

  1. Migrate data from GM to A1: DataCopy migrates a stepM*baseM*stepKa*baseK matrix block a1 from matrix A each time until matrix A is completed migrated. Then, migrate data from GM to B1: DataCopy migrates a stepKb*baseK*stepN*baseN matrix block b1 from matrix B each time until matrix B is completed migrated.
  2. Migrate data from A1 to A2: LoadData migrates a baseM * baseK matrix block a0 from a1 each time. Data is moved from B1 to B2 for transposing. Then, LoadData moves a baseK * baseN matrix block from b1 each time, and transposes the matrix block into a baseN * baseK matrix block b0.
  3. Perform matrix multiplication: Each time computation of one matrix block a0 × b0 is completed, a matrix block co1 of baseM * baseN is obtained.
  4. Migrate data from matrix block co1 to co2: DataCopy migrates one baseM * baseN matrix block co1 to matrix block co2 of singleCoreM * singleCoreN each time.
  5. Repeat steps 2 to 4 to compute matrix block a1 × b1.
  6. Migrate data from matrix block co2 to matrix block C: DataCopy migrates one matrix block co2 of singleCoreM * singleCoreN to matrix block C each time.
  7. Repeat steps 1 to 6 to complete the computation: matrix A × B = C.

Prototype

1
2
template <class A_TYPE, class B_TYPE, class C_TYPE, class BIAS_TYPE = C_TYPE, const MatmulConfig& MM_CFG = CFG_NORM, class MM_CB = MatmulCallBackFunc<nullptr, nullptr, nullptr>, MATMUL_POLICY_VARIADIC_TEMPLATE_OF(MATMUL_POLICY)>
using Matmul = matmul::MatmulImpl<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG, MM_CB, MATMUL_POLICY...>;
  • A_TYPE, B_TYPE, and C_TYPE are defined by MatmulType.
  • (Optional) MatmulConfig: Matmul template information. For details, see MatmulConfig.
  • (Optional) MATMUL_POLICY_VARIADIC_TEMPLATE_OF(MatmulPolicy). This parameter is reserved and does not need to be configured.
    1
    2
    #define MATMUL_POLICY_VARIADIC_TEMPLATE_OF(NAME)      \
    template <const auto& = MM_CFG, typename ...> class ...NAME
    

Parameters

Table 1 MatmulType parameters

Parameter

Description

POSITION

Logical position of memory

CubeFormat

TYPE

Note: The data types of matrix A and matrix B must be the same. For details about data type combinations, see Table 2.

ISTRANS

Whether to enable the matrix transpose function.

  • true indicates that the matrix transpose function is enabled. If the function is enabled, isTransposeA and isTransposeB in SetTensorA and SetTensorB are used to set whether to transpose matrix A and matrix B, respectively. If matrix A and matrix B are transposed, Matmul considers that the shape of matrix A is [K, M] and that of matrix B is [N, K].
  • false indicates that the matrix transpose function is disabled. If the function is disabled, SetTensorA and SetTensorB cannot be used to set whether to transpose matrix A and matrix B, respectively. In this case, Matmul considers that the shape of matrix A is [M, K] and that of matrix B is [K, N].

The default value is false, indicating that the transpose function is disabled.

LAYOUT

Data layout format.

NONE (default): BatchMatmul is not used. Other options indicate that BatchMatmul is used.

NORMAL: BMNK data layout mode.

BSNGD: data layout after reshaping is performed on the original BSH shape. For details, see the description of data layout in IterateBatch.

SBNGD: data layout after reshaping is performed on the original SBH shape. For details, see the description of data layout in IterateBatch.

BNGS1S2: matrix multiplication output of the first two data layouts. S1S2 data is stored continuously, and an S1S2 element is the data computed of a batch.

IBSHARE

Whether to enable IBShare. IBShare can reuse the same matrix A or matrix B data on L1 Buffer. When IBShare is enabled for both matrix A and matrix B, matrix A and matrix B on L1 Buffer are reused at the same time. In this case, only the Norm template is supported. (For details about how to use parameters in this scenario, see matmulABshare sample.)

Note that the following conditions must be met when IBShare is enabled for both matrix A and matrix B:

  • IBShare must also be enabled for matrix A and matrix B of other Matmul objects in the same operator.
  • To obtain the matrix calculation result, only the IterateAll API can be called to output the result to the GlobalTensor. That is, the calculation result is stored in the address of the global memory. Do not call other APIs such as GetTensorC.

This parameter is used together with the IBShare template except in the scenario where matrices A and B are reused at the same time. To use the IBShare template, the reused matrix must be fully loaded on the L1 Buffer. For details about parameter settings, see Table 2.

Returns

None

Availability

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// User-defined callback function
void DataCopyOut(const __gm__ void *gm, const LocalTensor<int8_t> &co1Local, const void *dataCopyOutParams, const uint64_t tilingPtr, const uint64_t dataPtr);
void CopyA1(const AscendC::LocalTensor<int8_t> &aMatrix, const __gm__ void *gm, int row, int col, int useM, int useK, const uint64_t tilingPtr, const uint64_t dataPtr);
void CopyB1(const AscendC::LocalTensor<int8_t> &bMatrix, const __gm__ void *gm, int row, int col, int useK, int useN, const uint64_t tilingPtr, const uint64_t dataPtr);

typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
matmul::Matmul<aType, bType, cType, biasType, CFG_MDL> mm1; 
matmul::MatmulConfig mmConfig{false, true, false, 128, 128, 64};
mmConfig.enUnitFlag = false;
matmul::Matmul<aType, bType, cType, biasType, mmConfig> mm2;
matmul::Matmul<aType, bType, cType, biasType, CFG_NORM, matmul::MatmulCallBackFunc<DataCopyOut, CopyA1, CopyB1>> mm3;