GetMMConfig
Function Usage
Allows to flexibly customize Matmul template parameters. You can set MatmulConfigMode, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams, and MatmulFuncParams to obtain the custom MatmulConfig.
MatmulConfigMode specifies the MatmulConfig templates to be obtained and modified. For details about each template, see Table 1. You can modify the parameters of the MatmulConfig templates by setting one or more variable parameters, that is, MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams in any sequence.
Prototype
1 2 | template <MatmulConfigMode configMode, typename... ArgTypes> __aicore__ inline constexpr MatmulConfig GetMMConfig(ArgTypes&&... args) |
Parameters
Parameter |
Description |
|---|---|
configMode |
Obtained MatmulConfig template. |
ArgTypes |
Variable template parameter. |
Parameter |
Input/Output |
Description |
|---|---|---|
args |
Input |
Variable arguments that can be configured by passing one or more of MatmulShapeParams, MatmulQuantParams, MatmulBatchParams and MatmulFuncParams as needed in any sequence. |
Parameter |
Description |
|---|---|
CONFIG_NORM |
Sets MatmulConfig to the Norm template by default. |
CONFIG_MDL |
Sets MatmulConfig to the MDL template by default. |
CONFIG_SPECIALMDL |
Sets MatmulConfig to the SpecialMDL template by default. |
CONFIG_IBSHARE |
Sets MatmulConfig to the IBShare template by default. |
Parameter |
Data Type |
Description |
|---|---|---|
singleCoreM |
uint32_t |
Shape size of a single core on the M axis, in elements. |
singleCoreN |
uint32_t |
Shape size of a single core on the N axis, in elements. |
singleCoreK |
uint32_t |
Shape size of a single core on the K axis, in elements. |
basicM |
uint32_t |
Equivalent to baseM. |
basicN |
uint32_t |
Equivalent to baseN. |
basicK |
uint32_t |
Equivalent to baseK. |
Parameter |
Data Type |
Description |
|---|---|---|
isPerTensor |
bool |
Whether quantization for matrix B is conducted per tensor (true) or per channel (false) in the scenario where matrix A's input type is half and matrix B's input type is int8. |
hasAntiQuantOffset |
bool |
Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8. |
Parameter |
Data Type |
Description |
|---|---|---|
batchLoop |
bool |
Whether to enable input and output of multiple batches. Values:
This parameter is valid only for BatchMatmul. |
bmmMode |
BatchMode |
Relationship between the sum of multi-batch data of the input A/B matrix of BatchMatmul and the value of L1 Buffer when the layout mode is set to NORMAL. Values:
|
isBiasBatch |
bool |
Whether the bias size involves batch axes in the BatchMatmul scenario. Values:
|
Parameter |
Data Type |
Description |
||
|---|---|---|---|---|
intrinsicsLimit |
bool |
Whether to verify the address offset for chip instruction movement, which affects the performance. Values:
|
||
enVecND2NZ |
bool |
Whether to enable ND2NZ using the vector. Values:
To enable this function, SetLocalWorkspace needs to be set. |
||
doMTE2Preload |
uint32_t |
Whether to enable preloading in the M/N direction when the MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is available only in the MDL template. Values:
Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and double buffering is enabled in the M/N direction. |
||
enableReuse |
bool |
Whether dataPtr in the callback function set by SetSelfDefineData directly passes computation data. Values:
|
||
enableUBReuse |
bool |
Whether to enable UB reuse. Values:
|
||
enableL1CacheUB |
bool |
Whether to cache UB computing blocks in L1. Values:
To cache UB computing blocks in L1, you must call SetMatmulConfigParams in the tiling implementation to configure related information. |
||
iterateOrder |
IterateOrder |
Iteration order for Matmul to perform matrix computation. The meaning of this parameter is the same as that of iterateOrder in Table 1. This parameter is valid only when ScheduleType is set to ScheduleType::OUTER_PRODUCT or 1. Values:
Note: When the MDL template is used, if IterateOrder is set to ORDER_M, the value of stepN in the TCubeTiling structure must be greater than 1. If IterateOrder is set to ORDER_N, the value of stepM in the TCubeTiling structure must be greater than 1. |
||
scheduleType |
ScheduleType |
Matmul data movement mode. Values:
Note:
|
||
enableDoubleCache |
bool |
Whether to cache two blocks in L1 after the IBShare template is enabled. Note that the size of the base block must be controlled to prevent the size of the two blocks from exceeding the L1 size limit. Values:
|
Returns
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 9 10 11 | // Obtain the MatmulConfig template and set it as the Norm template. constexpr static MatmulConfigMode configMode = MatmulConfigMode::CONFIG_NORM; // singleCoreM, singleCoreN, singleCoreK, basicM, basicN, and basicK constexpr static MatmulShapeParams shapeParams = {128, 128, 128, 64, 64, 64}; // Conduct quantization for matrix B per channel without using the offset coefficient. constexpr static MatmulQuantParams quantParams = {false, false}; // Disable the multi-batch function. constexpr static MatmulBatchParams batchParams{false}; // Disable the verification for the address offset for chip instruction movement, and enable ND2NZ using the vector. constexpr static MatmulFuncParams funcParams{false, true}; constexpr static MatmulConfig mmConfig = GetMMConfig<configMode>(shapeParams, quantParams, batchParams, funcParams); |