Mmad

Product Support

Product	Supported ( Prototype Without Bias Input )	Supported ( Prototype with Bias Input )
Atlas A3 training products/Atlas A3 inference products	√	√
Atlas A2 training products/Atlas A2 inference products	√	√
Atlas 200I/500 A2 inference products	√	√
Atlas inference product's AI Core	√	x
Atlas inference product's Vector Core	x	x
Atlas training products	√	x

Function

Performs the matrix multiplication and addition (C += A * B) operation. The matrices A, B, and C are data in A2, B2, and CO1, respectively.

The data formats of matrices A, B, and C are ZZ, ZN, and NZ, respectively.
In the following figure, each square represents a fractal matrix. The black line in the Z shape represents the data arrangement sequence, which starts in the upper left corner and ends in the lower right corner.

Matrix A: The row-major order is used in each fractal matrix and between fractal matrices. This is called ZZ format. The fractal shape is 16 x (32B/sizeof(AType)), and the size is 512 bytes.

Matrix B: The column-major order is used in each fractal matrix while the row-major order is used between fractal matrices. This is called NZ format. The fractal shape is (32B/sizeof (BType)) x 16, and the size is 512 bytes.

Matrix C: The row-major order is used in each fractal matrix, while the column-major order is used between fractal matrices. This is called ZN format. The fractal shape is 16 x 16, and the size is 256 elements.

The following is a simple example. It is assumed that the size of a fractal matrix is 2 x 2 (which does not comply with an actual situation and is merely used as an example), and sizes of the matrices A, B, and C are all 4 x 4.

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Arrangement order of matrix A: 0, 1, 4, 5, 2, 3, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15.

Arrangement order of matrix B: 0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15.

Arrangement order of matrix C: 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15.

Prototype

Without bias input

template <typename T, typename U, typename S>
__aicore__ inline void Mmad(const LocalTensor<T>& dst, const LocalTensor<U>& fm, const LocalTensor<S>& filter, const MmadParams& mmadParams)

With bias input

template <typename T, typename U, typename S, typename V>
__aicore__ inline void Mmad(const LocalTensor<T>& dst, const LocalTensor<U>& fm, const LocalTensor<S>& filter, const LocalTensor<V>& bias, const MmadParams& mmadParams)

Parameters

**Table 1** Template parameters
Parameter	Description
T	Data type of the destination operand.
U	Data type of the left matrix.
S	Data type of the right matrix.
V	Data type of the bias matrix.

**Table 2** Parameters
Parameter	Input/Output	Meaning
dst	Output	Destination operand; result matrix. Type: LocalTensor. Supported TPosition: CO1. The start address of LocalTensor must be 256-element-aligned.
fm	Input	Source operand; left matrix a. Type: LocalTensor. Supported TPosition: A2. The start address of LocalTensor must be 512-byte aligned.
filter	Input	Source operand; right matrix b. Type: LocalTensor. Supported TPosition: B2. The start address of LocalTensor must be 512-byte aligned.
bias	Input	Source operand; bias matrix. Type: LocalTensor. Supported TPosition: C2 and CO1. The start address of LocalTensor must be 128-byte aligned.
mmadParams	Input	Matrix multiplication parameter. For details about the definition of this parameter, see ${INSTALL_DIR}/include/ascendc/basic_api/interface/kernel_struct_mm.h. Replace *${INSTALL_DIR}* with the file storage path after the CANN software is installed. For details about the parameters in MmadParams, see Table 3.

**Table 3** Parameters in the MmadParams structure
Parameter	Meaning
m	Height of the left matrix. Value range: m ∈ [0, 4095]. The default value is 0.
n	Width of the right matrix. Value range: n ∈ [0, 4095]. The default value is 0.
k	Width of the left matrix and height of the right matrix. Value range: k ∈ [0, 4095]. The default value is 0.
cmatrixInitVal	Whether the initial value of matrix C is 0. The default value is true. true: The initial value of matrix C is 0. false: The initial value of matrix C is specified by cmatrixSource.
cmatrixSource	Whether the initial value of matrix C comes from C2 (hardware buffer for storing the bias) The default value is false. false: CO1 true: C2 For Atlas training products, this parameter can only be set to false. For Atlas inference product's AI Core, this parameter can only be set to false. For Atlas A2 training products/Atlas A2 inference products, this parameter can be true or false. For Atlas A3 training products/Atlas A3 inference products, this parameter can be true or false. For Atlas 200I/500 A2 inference products, this parameter can be true or false. Note: This parameter is invalid for the API with bias input. The system determines whether the initial value of matrix C is from CO1 or C2 based on the position of the bias input.
isBias	This parameter is deprecated. Do not use this parameter in new development. To add up the initial matrices, use the API with bias. You can also use the cmatrixInitVal and cmatrixSource parameters to configure the initial value source of matrix C. You are advised to use the API with bias, which is easier to configure than the cmatrixInitVal and cmatrixSource parameters. Whether the initial matrix needs to be added up. The default value is false. The options are as follows: false: matrix multiplication. The initial matrix does not need to be added up. (C = A * B) true: matrix multiplication and addition. The initial matrix needs to be added up. (C += A * B)
unitFlag	Fine-grained parallelism between Mmad and Fixpipe instructions. After this function is enabled, the computation result is moved out each time the hardware completes a fractal computation. This function is not applicable to scenarios where accumulation is performed in the L0C buffer. The options are as follows: 0: Value reserved. 2: The unitFlag function is enabled. After the hardware executes the instruction, the unitFlag function is not disabled. 3: The unitFlag function is enabled. After the hardware executes the instruction, the unitFlag function is disabled. When this function is enabled, set the unitFlag of the Mmad instruction to 3 for the last fractal and to 2 for other fractals. This parameter is supported only by the following models: Atlas A2 training products/Atlas A2 inference products Atlas A3 training products/Atlas A3 inference products
fmOffset	Reserved parameter. This parameter is reserved for future functions. You can use the default value.
enSsparse
enWinogradA
enWinogradB
kDirectionAlign

**Table 4** Supported mixed precision of **dst**, fm, and **filter** (Atlas training products)
Left Matrix fm Type	Right Matrix filter Type	Result Matrix dst Type
uint8_t	uint8_t	uint32_t
int8_t	int8_t	int32_t
uint8_t	int8_t	int32_t
half	half	half NOTE: The mixed precision of this type cannot reach double 1‰, and later processor versions do not support this type conversion. You are advised to use half input and float output. The double 1‰ means that the error between each actual data record and the true value does not exceed 1‰, and the total number of data records whose error exceeds 1‰ does not exceed 1‰ of the total number of data records.
half	half	float

**Table 5** Supported mixed precision of **dst**, fm, and **filter** (Atlas inference product's AI Core)
Left Matrix fm Type	Right Matrix filter Type	Result Matrix dst Type
int8_t	int8_t	int32_t
uint8_t	int8_t	int32_t
uint8_t	uint8_t	int32_t
half	half	half NOTE: The mixed precision of this type cannot reach double 1‰, and later processor versions do not support this type conversion. You are advised to use half input and float output. The double 1‰ means that the error between each actual data record and the true value does not exceed 1‰, and the total number of data records whose error exceeds 1‰ does not exceed 1‰ of the total number of data records.
half	half	float
int4b_t	int4b_t	int32_t

**Table 6** Supported mixed precision of **dst**, fm, and **filter** (Atlas 200I/500 A2 inference products)(Atlas A2 training products/Atlas A2 inference products)(Atlas A3 training products/Atlas A3 inference products)
Left Matrix fm Type	Right Matrix filter Type	Result Matrix dst Type
int8_t	int8_t	int32_t
half	half	float
float	float	float
bfloat16_t	bfloat16_t	float
int4b_t	int4b_t	int32_t

**Table 7** Supported mixed precision of **dst**, fm, **filter**, and **bias** (Atlas 200I/500 A2 inference products)(Atlas A2 training products/Atlas A2 inference products)(Atlas A3 training products/Atlas A3 inference products)
Left Matrix fm Type	Right Matrix filter Type	bias Type	Result Matrix dst Type
int8_t	int8_t	int32_t	int32_t
half	half	float	float
float	float	float	float
bfloat16_t	bfloat16_t	float	float

Restrictions

dst can only be located in CO1, fm can only be located in A2, and filter can only be located in B2.
If any of M, K, and N is 0, the instruction is not executed.
When M = 1, the GEMV function is enabled by default. In this case, the Mmad API reads data from L0A Buffer in ND format instead of ZZ format. Therefore, the left matrix needs to be directly arranged in ND format.
For details about the operand address alignment requirements, see General Address Alignment Restrictions.
The following uses an example to describe the arrangement of invalid and valid data.
The data type is half. When M = 30, K = 70, and N = 40, there are 2 × 5 matrices with the size of 16 × 16 in A2, 5 × 3 matrices with the size of 16 × 16 in B2, and 2 × 3 matrices with the size of 16 × 16 in CO1. In this scenario, M, K, and N are not multiples of 16. The matrix in the lower right corner of A2 actually has only 14 × 6 pieces of valid data, but also needs to occupy space of a 16 × 16 matrix. The invalid data is ignored during computation. In a 16 × 16 fractal data block, the arrangement of invalid and valid data is as follows.

Example

For details about the example without matrix multiplication bias, see Mmad sample.

For details about the example with matrix multiplication bias, see sample of Mmad with matrix multiplication bias.

Parent topic: Cube Computation