GetMDLConfig

Function Usage

Configures the parameters to obtain the user-defined MDL template.

Prototype

1
__aicore__ constexpr MatmulConfig GetMDLConfig(const bool intrinsicsLimit = false, const bool batchLoop = false, const uint32_t doMTE2Preload = 0, const bool isVecND2NZ = false, bool isPerTensor = false, bool hasAntiQuantOffset = false, const bool enUnitFlag = false, const bool isMsgReuse = true, const bool enableUBReuse = true, const bool enableL1CacheUB = false)

Parameters

All parameters of this API are used to set the parameters of the MatmulConfig structure. Parameters corresponding to each other have the same functions.

Table 1 API parameters

Parameter

Input/Output

Description

intrinsicsLimit

Input

Sets the intrinsicsCheck parameter.

Whether to enable cyclic data move-in when the inner axis (last axis) of the left or right matrix on a single core is greater than or equal to 65535. For example, for the left matrix A [M, K], if singleCoreK of the inner axis on a single core is greater than 65535 and this parameter is set to true, data is moved in cyclically in the API. Values:

  • false (default): When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is not moved in cyclically.
  • true: When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is moved in cyclically.

batchLoop

Input

Sets the isNBatch parameter.

Whether to enable multi-batch input and output. This parameter is valid only for BatchMatmul. Values:

  • false (default): disables multi-batch input and output.
  • true: enables the multi-batch function.

doMTE2Preload

Input

Sets the doMTE2Preload parameter.

Whether to enable the preloading function in the M/N direction when MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is valid only for the MDL template. Values:

  • 0 (default): disables the function.
  • 1: enables preloading in the M direction.
  • 2: enables preloading in the N direction.

Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and double buffering is enabled in the M/N direction.

isVecND2NZ

Input

Sets the enVecND2NZ parameter.

Whether to enable ND2NZ (converting data from ND format to NZ format) using the vector. To enable this function, you need to set SetLocalWorkspace. Values:

  • false (default): disables ND2NZ using the vector.
  • true: enables ND2NZ using the vector.

isPerTensor

Input

Sets the isPerTensor parameter.

Whether quantization for matrix B is conducted per tensor (true) or per channel (false) in the scenario where matrix A's input type is half and matrix B's input type is int8.

hasAntiQuantOffset

Input

Sets the hasAntiQuantOffset parameter.

Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8.

enUnitFlag

Input

Sets the enUnitFlag parameter.

Whether to enable the unitflag function to allow parallel execution of computation and data movement for performance improvement. By default, the function is enabled when the Norm and IBShare templates are used and disabled when the MDL template is used. Values:

  • false: disables the unitflag function.
  • true: enables the unitflag function.

isMsgReuse

Input

Sets the enableReuse parameter.

SetSelfDefineDatadirectly passes computation data. The parameter values are as follows:

  • true: passes computation data. Only a single value is supported.
  • false: passes data address information stored on GM.

enableUBReuse

Input

Sets the enableUBReuse parameter.

Whether to enable Unified Buffer reuse. Values:

  • true: enables Unified Buffer reuse.
  • false: disables Unified Buffer reuse.

enableL1CacheUB

Input

Sets the enableL1CacheUB parameter.

Whether to cache Unified Buffer computing blocks in L1 Buffer. Values:

  • true: caches Unified Buffer computing blocks in L1 Buffer.
  • false: does not cache Unified Buffer computing blocks in L1 Buffer.

To cache Unified Buffer computing blocks in L1 Buffer, you must call SetMatmulConfigParams in the tiling implementation to configure related information.

Availability

Precautions

None

Example

1
2
3
4
5
6
7
constexpr MatmulConfig MM_CFG = GetMDLConfig();
Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);