Instructions for Use

Ascend C provides a group of Matmul high-level APIs for users to quickly implement Matmul matrix multiplication.

Matmul formula: C = A × B + Bias.

A and B are the source operands. A is a left matrix with shape [M, K], and B is a right matrix with shape [K, N].
C is the destination operand, which is a matrix that stores the matrix multiplication result. Its shape is [M, N].
Bias indicates the matrix multiplication bias, whose shape is [1, N]. Each row of the A x B result matrix is biased.

Figure 1 Matmul matrix multiplication

In addition to the basic functions of Matmul, the feature scenarios of Matmul are described as follows. You can learn about them based on the actual application scenarios.

Matmul static tiling
Matmul static tiling involves obtaining constant Matmul tiling parameters at compile time and compiling the operator accordingly to reduce scalar computations and enhance the overall performance of the operator. Specifically, you can obtain a custom Matmul template by specifying the single shape (singleCoreM/singleCoreN/singleCoreK) and base shape (basicM/basicN/basicK) parameters in the MatmulConfig API for obtaining the template or specifying only the base shape parameters. Then, you call the GetMatmulApiTiling API to obtain the constant Matmul tiling parameters. For details, see the following description.

The M-axis direction mentioned below is the vertical direction of matrix A, the K-axis direction is the horizontal direction of matrix A or the vertical direction of matrix B, and the N-axis direction is the horizontal direction of matrix B.

Specific steps of implementing Matmul matrix multiplication are as follows:

Create a Matmul object.
Perform the initialization operation.
Set the left matrix A, right matrix B, and bias.
Execute the matrix multiplication operation.
End the matrix multiplication operation.

Create a Matmul object.

The following is an example of creating a Matmul object:

In CUBE_ONLY (only matrix computation) scenario, the ASCENDC_CUBE_ONLY code macro needs to be set.
By default, the MIX mode is used (including matrix computation and vector computation). In this scenario, ASCENDC_CUBE_ONLY cannot be set.

       
            // In CUBE_ONLY, set this code macro before #include "lib/matmul_intf.h".
// #define ASCENDC_CUBE_ONLY 
#include "lib/matmul_intf.h"

typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> bType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
matmul::Matmul<aType, bType, cType, biasType> mm;

During object creation, input the parameter types of matrix A, matrix B, matrix C, and the bias. The type information is defined by MatmulType, including the logical location of memory, data format, and data type.

       
            template <AscendC::TPosition POSITION, CubeFormat FORMAT, typename TYPE, bool ISTRANS = false, LayoutMode LAYOUT = LayoutMode::NONE, bool IBSHARE = false> struct MatmulType {
    constexpr static AscendC::TPosition pos = POSITION;
    constexpr static CubeFormat format = FORMAT;
    using T = TYPE;
    constexpr static bool isTrans = ISTRANS;
    constexpr static LayoutMode layout = LAYOUT;
    constexpr static bool ibShare = IBSHARE;
};

**Table 1** **MatmulType** parameters
Parameter	Description
POSITION	Logical position of memory
CubeFormat
TYPE	Note: The data types of matrix A and matrix B must be the same. For details about data type combinations, see Table 2.
ISTRANS	Whether to enable the matrix transpose function. true indicates that the matrix transpose function is enabled. If the function is enabled, isTransposeA and isTransposeB in SetTensorA and SetTensorB are used to set whether to transpose matrix A and matrix B, respectively. If matrix A and matrix B are transposed, Matmul considers that the shape of matrix A is [K, M] and that of matrix B is [N, K]. false indicates that the matrix transpose function is disabled. If the function is disabled, SetTensorA and SetTensorB cannot be used to set whether to transpose matrix A and matrix B, respectively. In this case, Matmul considers that the shape of matrix A is [M, K] and that of matrix B is [K, N]. The default value is false, indicating that the transpose function is disabled.
LAYOUT	Data layout format. NONE (default): BatchMatmul is not used. Other options indicate that BatchMatmul is used. NORMAL: BMNK data layout mode. BSNGD: data layout after reshaping is performed on the original BSH shape. For details, see the description of data layout in IterateBatch. SBNGD: data layout after reshaping is performed on the original SBH shape. For details, see the description of data layout in IterateBatch. BNGS1S2: matrix multiplication output of the first two data layouts. S1S2 data is stored continuously, and an S1S2 element is the data computed of a batch.
IBSHARE	Whether to enable IBShare. IBShare can reuse the same matrix A or matrix B data on L1 Buffer. When IBShare is enabled for both matrix A and matrix B, matrix A and matrix B on L1 Buffer are reused at the same time. In this case, only the Norm template is supported. (For details about how to use parameters in this scenario, see matmulABshare sample.) Note that the following conditions must be met when IBShare is enabled for both matrix A and matrix B: IBShare must also be enabled for matrix A and matrix B of other Matmul objects in the same operator. To obtain the matrix calculation result, only the IterateAll API can be called to output the result to the GlobalTensor. That is, the calculation result is stored in the address of the global memory. Do not call other APIs such as GetTensorC. This parameter is used together with the IBShare template except in the scenario where matrices A and B are reused at the same time. To use the IBShare template, the reused matrix must be fully loaded on the L1 Buffer. For details about parameter settings, see Table 2.

**Table 2** Combinations of Matmul input and output data types
Matrix A	Matrix B	Bias	Matrix C
float	float	float/half	float
half	half	float	float
half	half	half	float
int8_t	int8_t	int32_t	int32_t/half
int4b_t	int4b_t	int32_t	int32_t/half
bfloat16_t	bfloat16_t	float	float
bfloat16_t	bfloat16_t	half	float
half	half	float	int8_t
bfloat16_t	bfloat16_t	float	int8_t
int8_t	int8_t	int32_t	int8_t
half	half	float	half
half	half	half	half
bfloat16_t	bfloat16_t	float	bfloat16_t

Perform the initialization operation.

       
            REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);

Set the left matrix A, right matrix B, and bias.

       
            mm.SetTensorA(gm_a);    // Set the left matrix A.
mm.SetTensorB(gm_b);    // Set the right matrix B.
mm.SetBias(gm_bias);    // Set the bias.

Execute the matrix multiplication operation.

Call Iterate to complete a single iterative computation, and use a while loop to compute the full data on a single core. The Iterate method allows for flexible control over the number of iterations required to compute the desired amount of data.

         
              while (mm.Iterate()) {   
    mm.GetTensorC(gm_c); 
}

Call IterateAll to compute all data on a single core. The IterateAll method does not require cyclic iterations and is relatively simple to use.
1

mm.IterateAll(gm_c);

End the matrix multiplication operation.
1

mm.End();

Parent topic: Matmul