GetMMConfig

Function Usage

Allows to flexibly customize Matmul template parameters. You can set MatmulConfigMode, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams, and MatmulFuncParams to obtain the custom MatmulConfig.

MatmulConfigMode specifies the MatmulConfig templates to be obtained and modified. For details about each template, see Table 1. You can modify the parameters of the MatmulConfig templates by setting one or more variable parameters, that is, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams in any sequence.

Prototype

template <MatmulConfigMode configMode, typename... ArgTypes>
__aicore__ inline constexpr MatmulConfig GetMMConfig(ArgTypes&&... args)

Parameters

**Table 1** Parameters in the template
Parameter	Description
configMode	Obtained MatmulConfig template.
ArgTypes	Variable template parameter.

**Table 2** Parameters
Parameter	Input/Output	Description
args	Input	Variable arguments that can be configured by passing one or more of MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams as needed in any sequence.

**Table 3** MatmulConfigMode parameters
Parameter	Description
CONFIG_NORM	Sets MatmulConfig to the Norm template by default.
CONFIG_MDL	Sets MatmulConfig to the MDL template by default.
CONFIG_SPECIALMDL	Sets MatmulConfig to the SpecialMDL template by default.
CONFIG_IBSHARE	Sets MatmulConfig to the IBShare template by default.

**Table 4** MatmulShapeParams parameters
Parameter	Data Type	Description
singleCoreM	uint32_t	Shape size of a single core on the M axis, in elements.
singleCoreN	uint32_t	Shape size of a single core on the N axis, in elements.
singleCoreK	uint32_t	Shape size of a single core on the K axis, in elements.
basicM	uint32_t	Equivalent to baseM.
basicN	uint32_t	Equivalent to baseN.
basicK	uint32_t	Equivalent to baseK.

**Table 5** MatmulQuantParams parameters
Parameter	Data Type	Description
isPerTensor	bool	Whether quantization for matrix B is conducted per tensor (true) or per channel (false) in the scenario where matrix A's input type is half and matrix B's input type is int8.
hasAntiQuantOffset	bool	Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8.

**Table 6** MatmulBatchParams parameters
Parameter	Data Type	Description
batchLoop	bool	Whether to enable input and output of multiple batches. Values: false (default): disables the multi-batch function. true: enables the multi-batch function. This parameter is valid only for BatchMatmul.
bmmMode	BatchMode	Relationship between the sum of multi-batch data of the input A/B matrix of BatchMatmul and the value of L1 Buffer when the layout mode is set to NORMAL. Values: BatchMode::BATCH_LESS_THAN_L1: Total amount of multi-batch data < Size of L1 Buffer. BatchMode::BATCH_LARGE_THAN_L1: Total amount of multi-batch data > Size of L1 Buffer. BatchMode::SINGLE_LARGE_THAN_L1: Total amount of single-batch data > Size of L1 Buffer.
isBiasBatch	bool	Whether the bias size involves batch axes in the BatchMatmul scenario. Values: true (default): The bias size involves batch axes. The bias size is equal to the product of batch size and N. false: The bias size does not involve batch axes. The bias size is N. Bias is reused in the BatchMatmul computation.

Table 7 MatmulFuncParams parameters

Parameter

Data Type

Description

intrinsicsLimit

bool

Whether to verify the address offset for chip instruction movement, which affects the performance. Values:

false (default): does not verify the address offset for chip instruction movement.
true: verifies the address offset for chip instruction movement.

enVecND2NZ

bool

Whether to enable ND2NZ using the vector. Values:

false (default): disables ND2NZ using the vector.
true: enables ND2NZ using the vector.

To enable this function, SetLocalWorkspace needs to be set.

doMTE2Preload

uint32_t

Whether to enable preloading in the M/N direction when the MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is available only in the MDL template. Values:

0 (default): disables the function.
1: enables preloading in the M direction.
2: enables preloading in the N direction.

Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and double buffering is enabled in the M/N direction.

enableReuse

bool

Whether dataPtr in the callback function set by SetSelfDefineData directly passes computation data. Values:

true: passes computation data. Only a single value is supported.
false: passes data address information stored on GM.

enableUBReuse

bool

Whether to enable UB reuse. Values:

true: enables UB reuse.
false: disables UB reuse.

enableL1CacheUB

bool

Whether to cache UB computing blocks in L1. Values:

true: caches UB computing blocks in L1.
false: does not cache UB computing blocks in L1.

To cache UB computing blocks in L1, you must call SetMatmulConfigParams in the tiling implementation to configure related information.

iterateOrder

IterateOrder

Iteration order for Matmul to perform matrix computation. The meaning of this parameter is the same as that of iterateOrder in Table 1. This parameter is valid only when ScheduleType is set to ScheduleType::OUTER_PRODUCT or 1. Values:

enum class IterateOrder {
    ORDER_M = 0,   // Offset to the M-axis direction and then to the N-axis direction.
    ORDER_N,       // Offset to the N-axis direction and then to the M-axis direction.
    UNDEF,         // Invalid currently.
};

Note: When the MDL template is used, if IterateOrder is set to ORDER_M, the value of stepN in the TCubeTiling structure must be greater than 1. If IterateOrder is set to ORDER_N, the value of stepM in the TCubeTiling structure must be greater than 1.

scheduleType

ScheduleType

Matmul data movement mode. Values:

ScheduleType::INNER_PRODUCT or 0 (default): performs MTE1 cyclic movement in the K direction.
ScheduleType::OUTER_PRODUCT or 1: performs MTE1 cyclic movement in the M or N direction. After being enabled, this parameter must be used together with IterateOrder. This configuration takes effect only in the Norm template (BatchMatmul scenario) and the MDL template.
- If the value of IterateOrder is set to ORDER_M, cyclic movement is performed in the N direction, that is, data in matrix B is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreN is greater than that of baseN.)
- If the value of IterateOrder is set to ORDER_N, cyclic movement is performed in the M direction, that is, data in matrix A is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreM is greater than that of baseM.)
- The cyclic movement in the M direction and N direction cannot be enabled at the same time.

Note:

When the value of singleCoreK is greater than that of baseK, ScheduleType::OUTER_PRODUCT cannot be set. The default mode must be used.
This parameter can be set to ScheduleType::OUTER_PRODUCT or 1 only when the MDL template calls IterateAll for computation.
This parameter can be set to ScheduleType::OUTER_PRODUCT or 1 only when matrix C is output to GM.

enableDoubleCache

bool

Whether to cache two blocks in L1 after the IBShare template is enabled. Note that the size of the base block must be controlled to prevent the size of the two blocks from exceeding the L1 size limit. Values:

false (default): caches one block in L1.
true: caches two blocks in L1.

Returns

MatmulConfig struct.

Availability

Precautions

None

Example

// Obtain the MatmulConfig template and set it as the Norm template.
constexpr static MatmulConfigMode configMode = MatmulConfigMode::CONFIG_NORM;
// singleCoreM, singleCoreN, singleCoreK, basicM, basicN, and basicK
constexpr static MatmulShapeParams shapeParams = {128, 128, 128, 64, 64, 64};
// Conduct quantization for matrix B per channel without using the offset coefficient.
constexpr static MatmulQuantParams quantParams = {false, false};
// Disable the multi-batch function.
constexpr static MatmulBatchParams batchParams{false};
// Disable the verification for the address offset for chip instruction movement, and enable ND2NZ using the vector.
constexpr static MatmulFuncParams funcParams{false, true};
constexpr static MatmulConfig mmConfig = GetMMConfig<configMode>(shapeParams, quantParams, batchParams, funcParams);

Parent topic: Matmul