GetMMConfig

Function Usage

Allows to flexibly customize Matmul template parameters. You can set MatmulConfigMode, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams, and MatmulFuncParams to obtain the custom MatmulConfig.

MatmulConfigMode specifies the MatmulConfig templates to be obtained and modified. For details about each template, see Table 1. You can modify the parameters of the MatmulConfig templates by setting one or more variable parameters, that is, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams in any sequence.

Prototype

1
2
template <MatmulConfigMode configMode, typename... ArgTypes>
__aicore__ inline constexpr MatmulConfig GetMMConfig(ArgTypes&&... args)

Parameters

Table 1 Parameters in the template

Parameter

Description

configMode

Obtained MatmulConfig template.

ArgTypes

Variable template parameter.

Table 2 Parameters

Parameter

Input/Output

Description

args

Input

Variable arguments that can be configured by passing one or more of MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams as needed in any sequence.

Table 3 MatmulConfigMode parameters

Parameter

Description

CONFIG_NORM

Sets MatmulConfig to the Norm template by default.

CONFIG_MDL

Sets MatmulConfig to the MDL template by default.

CONFIG_SPECIALMDL

Sets MatmulConfig to the SpecialMDL template by default.

CONFIG_IBSHARE

Sets MatmulConfig to the IBShare template by default.

Table 4 MatmulShapeParams parameters

Parameter

Data Type

Description

singleCoreM

uint32_t

Shape size of a single core on the M axis, in elements.

singleCoreN

uint32_t

Shape size of a single core on the N axis, in elements.

singleCoreK

uint32_t

Shape size of a single core on the K axis, in elements.

basicM

uint32_t

Equivalent to baseM.

basicN

uint32_t

Equivalent to baseN.

basicK

uint32_t

Equivalent to baseK.

Table 5 MatmulQuantParams parameters

Parameter

Data Type

Description

isPerTensor

bool

Whether quantization for matrix B is conducted per tensor (true) or per channel (false) in the scenario where matrix A's input type is half and matrix B's input type is int8.

hasAntiQuantOffset

bool

Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8.

Table 6 MatmulBatchParams parameters

Parameter

Data Type

Description

batchLoop

bool

Whether to enable input and output of multiple batches. Values:

  • false (default): disables the multi-batch function.
  • true: enables the multi-batch function.

This parameter is valid only for BatchMatmul.

bmmMode

BatchMode

Relationship between the sum of multi-batch data of the input A/B matrix of BatchMatmul and the value of L1 Buffer when the layout mode is set to NORMAL. Values:

  • BatchMode::BATCH_LESS_THAN_L1: Total amount of multi-batch data < Size of L1 Buffer.
  • BatchMode::BATCH_LARGE_THAN_L1: Total amount of multi-batch data > Size of L1 Buffer.
  • BatchMode::SINGLE_LARGE_THAN_L1: Total amount of single-batch data > Size of L1 Buffer.

isBiasBatch

bool

Whether the bias size involves batch axes in the BatchMatmul scenario. Values:

  • true (default): The bias size involves batch axes. The bias size is equal to the product of batch size and N.
  • false: The bias size does not involve batch axes. The bias size is N. Bias is reused in the BatchMatmul computation.
Table 7 MatmulFuncParams parameters

Parameter

Data Type

Description

intrinsicsLimit

bool

Whether to verify the address offset for chip instruction movement, which affects the performance. Values:

  • false (default): does not verify the address offset for chip instruction movement.
  • true: verifies the address offset for chip instruction movement.

enVecND2NZ

bool

Whether to enable ND2NZ using the vector. Values:

  • false (default): disables ND2NZ using the vector.
  • true: enables ND2NZ using the vector.

To enable this function, SetLocalWorkspace needs to be set.

doMTE2Preload

uint32_t

Whether to enable preloading in the M/N direction when the MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is available only in the MDL template. Values:

  • 0 (default): disables the function.
  • 1: enables preloading in the M direction.
  • 2: enables preloading in the N direction.

Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and double buffering is enabled in the M/N direction.

enableReuse

bool

Whether dataPtr in the callback function set by SetSelfDefineData directly passes computation data. Values:

  • true: passes computation data. Only a single value is supported.
  • false: passes data address information stored on GM.

enableUBReuse

bool

Whether to enable UB reuse. Values:

  • true: enables UB reuse.
  • false: disables UB reuse.

enableL1CacheUB

bool

Whether to cache UB computing blocks in L1. Values:

  • true: caches UB computing blocks in L1.
  • false: does not cache UB computing blocks in L1.

To cache UB computing blocks in L1, you must call SetMatmulConfigParams in the tiling implementation to configure related information.

iterateOrder

IterateOrder

Iteration order for Matmul to perform matrix computation. The meaning of this parameter is the same as that of iterateOrder in Table 1. This parameter is valid only when ScheduleType is set to ScheduleType::OUTER_PRODUCT or 1. Values:

1
2
3
4
5
enum class IterateOrder {
    ORDER_M = 0,   // Offset to the M-axis direction and then to the N-axis direction.
    ORDER_N,       // Offset to the N-axis direction and then to the M-axis direction.
    UNDEF,         // Invalid currently.
};

Note: When the MDL template is used, if IterateOrder is set to ORDER_M, the value of stepN in the TCubeTiling structure must be greater than 1. If IterateOrder is set to ORDER_N, the value of stepM in the TCubeTiling structure must be greater than 1.

scheduleType

ScheduleType

Matmul data movement mode. Values:

  • ScheduleType::INNER_PRODUCT or 0 (default): performs MTE1 cyclic movement in the K direction.
  • ScheduleType::OUTER_PRODUCT or 1: performs MTE1 cyclic movement in the M or N direction. After being enabled, this parameter must be used together with IterateOrder. This configuration takes effect only in the Norm template (BatchMatmul scenario) and the MDL template.
    • If the value of IterateOrder is set to ORDER_M, cyclic movement is performed in the N direction, that is, data in matrix B is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreN is greater than that of baseN.)
    • If the value of IterateOrder is set to ORDER_N, cyclic movement is performed in the M direction, that is, data in matrix A is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreM is greater than that of baseM.)
    • The cyclic movement in the M direction and N direction cannot be enabled at the same time.

Note:

  • When the value of singleCoreK is greater than that of baseK, ScheduleType::OUTER_PRODUCT cannot be set. The default mode must be used.
  • This parameter can be set to ScheduleType::OUTER_PRODUCT or 1 only when the MDL template calls IterateAll for computation.
  • This parameter can be set to ScheduleType::OUTER_PRODUCT or 1 only when matrix C is output to GM.

enableDoubleCache

bool

Whether to cache two blocks in L1 after the IBShare template is enabled. Note that the size of the base block must be controlled to prevent the size of the two blocks from exceeding the L1 size limit. Values:

  • false (default): caches one block in L1.
  • true: caches two blocks in L1.

Availability

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Obtain the MatmulConfig template and set it as the Norm template.
constexpr static MatmulConfigMode configMode = MatmulConfigMode::CONFIG_NORM;
// singleCoreM, singleCoreN, singleCoreK, basicM, basicN, and basicK
constexpr static MatmulShapeParams shapeParams = {128, 128, 128, 64, 64, 64};
// Conduct quantization for matrix B per channel without using the offset coefficient.
constexpr static MatmulQuantParams quantParams = {false, false};
// Disable the multi-batch function.
constexpr static MatmulBatchParams batchParams{false};
// Disable the verification for the address offset for chip instruction movement, and enable ND2NZ using the vector.
constexpr static MatmulFuncParams funcParams{false, true};
constexpr static MatmulConfig mmConfig = GetMMConfig<configMode>(shapeParams, quantParams, batchParams, funcParams);