Mmad

Supported Products

Product

Supported/Unsupported

Prototype without bias input

)

Supported/Unsupported

Prototype with bias input

)

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

x

Atlas training products

x

Function Usage

Performs the matrix multiplication and addition (C += A * B) operation. The matrices A, B, and C are data in A2, B2, and CO1, respectively.

  • The data formats of matrices A, B, and C are ZZ, ZN, and NZ, respectively.

    In the following figure, each square represents a fractal matrix. The black line in the Z shape represents the data arrangement sequence, which starts in the upper left corner and ends in the lower right corner.

    Matrix A: The row-major order is used in each fractal matrix and between fractal matrices. This is called ZZ format. The fractal shape is 16 x (32B/sizeof(AType)), and the size is 512 bytes.

    Matrix B: The column-major order is used in each fractal matrix while the row-major order is used between fractal matrices. This is called NZ format. The fractal shape is (32B/sizeof (BType)) x 16, and the size is 512 bytes.

    Matrix C: The row-major order is used in each fractal matrix, while the column-major order is used between fractal matrices. This is called ZN format. The fractal shape is 16 x 16, and the size is 256 elements.

    The following is a simple example. It is assumed that the size of a fractal matrix is 2 x 2 (which does not comply with an actual situation and is merely used as an example), and sizes of the matrices A, B, and C are all 4 x 4.

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    Arrangement order of matrix A: 0, 1, 4, 5, 2, 3, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15.

    Arrangement order of matrix B: 0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15.

    Arrangement order of matrix C: 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15.

Prototype

  • Bias not passed in.
    1
    2
    template <typename T, typename U, typename S>
    __aicore__ inline void Mmad(const LocalTensor<T>& dst, const LocalTensor<U>& fm, const LocalTensor<S>& filter, const MmadParams& mmadParams)
    
  • Input bias
    1
    2
    template <typename T, typename U, typename S, typename V>
    __aicore__ inline void Mmad(const LocalTensor<T>& dst, const LocalTensor<U>& fm, const LocalTensor<S>& filter, const LocalTensor<V>& bias, const MmadParams& mmadParams)
    

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the destination operand.

U

Data type of the left matrix.

S

Data type of the right matrix.

V

Data type of the Bias matrix.

Table 2 Parameters

Parameter

Input/Output

Meaning

dst

Output

Destination operand; result matrix. Type: LocalTensor. Supported TPosition: CO1.

The start address of LocalTensor must be 256-element-aligned.

fm

Input

Source operand; left matrix a. Type: LocalTensor. Supported TPosition: A2.

The start address of LocalTensor must be 512-byte aligned.

filter

Input

Source operand; right matrix b. Type: LocalTensor. Supported TPosition: B2.

The start address of LocalTensor must be 512-byte aligned.

bias

Input

Source operand; bias matrix. Type: LocalTensor. Supported TPosition: C2 and CO1.

The start address of LocalTensor must be 128-byte aligned.

mmadParams

Input

Matrix multiplication parameters. For details about the definition of this parameter, see ${INSTALL_DIR}/include/ascendc/basic_api/interface/kernel_struct_mm.h. Replace ${INSTALL_DIR} with the actual path where the CANN software is installed.

For details about the MmadParams parameters, see Table 3.

Table 3 Parameters in the MmadParams structure

Parameter

Meaning

m

Height of the left matrix. Value range: m ∈ [0, 4095]. The default value is 0.

n

Width of the right matrix. Value range: n ∈ [0, 4095]. The default value is 0.

k

Width of the left matrix and height of the right matrix. Value range: k ∈ [0, 4095]. The default value is 0.

cmatrixInitVal

Whether the initial value of matrix C is 0. The default value is true.

  • true: The initial value of matrix C is 0.
  • false: The initial value of matrix C is specified by cmatrixSource.

cmatrixSource

Whether the initial value of matrix C comes from C2 (hardware buffer for storing the bias) The default value is false.

  • false: CO1
  • true: C2

For the Atlas training products, this parameter can only be set to false.

For the Atlas inference product's AI Core, this parameter can only be set to false.

For the Atlas A2 training products/Atlas A2 inference products, this parameter can be true or false.

For the Atlas A3 training products/Atlas A3 inference products, this parameter can be true or false.

For the Atlas A3 training products/Atlas A3 inference products, this parameter can be true or false.

Note: This parameter is invalid for the API with bias input. The system determines whether the initial value of matrix C is from CO1 or C2 based on the position of the bias input.

isBias

This parameter is deprecated. Do not use this parameter in new development. To add up the initial matrices, use the API with biasLocal. You can also use the cmatrixInitVal and cmatrixSource parameters to configure the initial value source of matrix C. You are advised to use the API with biasLocal, which is easier to configure than the cmatrixInitVal and cmatrixSource parameters.

Whether the initial matrix needs to be added up. The default value is false. The options are as follows:

  • false: matrix multiplication. The initial matrix does not need to be added up. (C = A * B)
  • true: matrix multiplication and addition. The initial matrix needs to be added up. (C += A * B)

unitFlag

unitFlag is a fine-grained parallelism of MMAD and Fixpipe instructions. After this function is enabled, the hardware moves out the computation result each time after a fractal is computed. This function is not applicable to the scenario where accumulation is performed in the L0C buffer. The options are as follows:

0: reserved value

2: unitFlag is enabled. After the hardware executes the instruction, the unitFlag function is not disabled.

3: unitFlag is enabled. After the hardware executes the instruction, the unitFlag function is disabled.

When this function is enabled, the unitFlag of the MMAD instruction is set to 3 for the last fractal and to 2 for other fractals.

This parameter is supported only by the following models:

Atlas A2 training products/Atlas A2 inference products

Atlas A3 training products/Atlas A3 inference products

fmOffset

Reserved. This parameter is reserved for future functions. You can use the default value for now.

enSsparse

enWinogradA

enWinogradB

kDirectionAlign

Table 4 Precision type combinations supported by dst, fm, and filter (Atlas training products)

Left matrix fm type

Right matrix filter type

Result matrix dst type

uint8_t

uint8_t

uint32_t

int8_t

int8_t

int32_t

uint8_t

int8_t

int32_t

half

half

half

NOTE:

The mixed precision of this type cannot reach double 1‰, and later processor versions do not support this type conversion. You are advised to use half input and float output.

The double one-thousandth means that the error between each actual data and the true value does not exceed one-thousandth, and the total number of data records whose error exceeds one-thousandth does not exceed one-thousandth of the total number of data records.

half

half

float

Table 5 Precision type combinations supported by dst, fm, and filter (Atlas inference product's AI Core)

Left matrix fm type

Right matrix filter type

Result matrix dst type

int8_t

int8_t

int32_t

uint8_t

int8_t

int32_t

uint8_t

uint8_t

int32_t

half

half

half

NOTE:

The mixed precision of this type cannot reach double 1‰, and later processor versions do not support this type conversion. You are advised to use half input and float output.

1‰ means that the error between each actual data and the true value does not exceed 1‰, and the total number of data records whose error exceeds 1‰ does not exceed 1‰ of the total number of data records.

half

half

float

int4b_t

int4b_t

int32_t

Table 6 Supported precision type combinations of dst, fm, and filter (Atlas 200I/500 A2 inference products) (Atlas A2 training products/Atlas A2 inference products) (Atlas A3 training products/Atlas A3 inference products)

Left matrix fm type

Right matrix filter type

Result matrix dst type

int8_t

int8_t

int32_t

half

half

float

float

float

float

bfloat16_t

bfloat16_t

float

int4b_t

int4b_t

int32_t

Table 7 Supported precision type combinations of dst, fm, filter, and bias (Atlas 200I/500 A2 inference products) (Atlas A2 training products/Atlas A2 inference products) (Atlas A3 training products/Atlas A3 inference products)

Left matrix fm type

Right matrix filter type

bias type

Result matrix dst type

int8_t

int8_t

int32_t

int32_t

half

half

float

float

float

float

float

float

bfloat16_t

bfloat16_t

float

float

Restrictions

  • dst can only be placed in CO1, fm can only be placed in A2, and filter can only be placed in B2.
  • If any of M, K, and N is 0, the instruction is not executed.
  • When M = 1, the General Matrix-Vector Multiplication (GEMV) function is enabled by default. In this case, the Mmad API reads data from the L0A Buffer in ND format instead of ZZ format. Therefore, the left matrix needs to be directly arranged in ND format.
  • For details about the operand address alignment requirements, see General Address Alignment Restrictions.
  • The following uses an example to describe the arrangement of invalid and valid data.

    The data type is half. When M = 30, K = 70, and N = 40, A2 contains two 16 x 16 matrices with 2 x 5 elements, B2 contains five 16 x 16 matrices with 5 x 3 elements, and CO1 contains two 16 x 16 matrices with 2 x 3 elements. In this scenario, M, K, and N are not multiples of 16. The matrix in the lower right corner of A2 actually has only 14 x 6 pieces of valid data, but also needs to occupy space of a 16 x 16 matrix. The invalid data is ignored during computation. In a 16 x 16 fractal data block, the arrangement of invalid and valid data is as follows.

Example

For details about the example without matrix multiplication bias, see Mmad sample.

For details about the example with matrix multiplication bias, see sample of Mmad with matrix multiplication bias.