GetMDLConfig

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

Configures the parameters of the MDL template and obtains the custom MDL template. For details about the MDL template, see Table 1.

Prototype

      
           __aicore__ constexpr MatmulConfig GetMDLConfig(const bool intrinsicsLimit = false, const bool batchLoop = false, const uint32_t doMTE2Preload = 0, const bool isVecND2NZ = false, bool isPerTensor = false, bool hasAntiQuantOffset = false, const bool enUnitFlag = false, const bool isMsgReuse = true, const bool enableUBReuse = true, const bool enableL1CacheUB = false, const bool enableMixDualMaster = false, const bool enableKdimReorderLoad = false)

Parameters

All parameters of this API are used to set the parameters of the MatmulConfig structure. The functions of the corresponding parameters are the same.

**Table 1** API parameters
Parameter	Input/Output	Description
intrinsicsLimit	Input	Sets the intrinsicsCheck parameter. Whether to enable cyclic data move-in from the Global Memory to L1 Buffer when the inner axis (last axis) of the left or right matrix on a single core is greater than or equal to 65535 (number of elements). For example, for the left matrix A [M, K], if singleCoreK of the inner axis on a single core is greater than 65535 and this parameter is set to true, data is moved in cyclically in the API. Values: false (default): When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is not moved in cyclically. true: When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is moved in cyclically.
batchLoop	Input	Sets the isNBatch parameter. Whether to enable multi-batch input and output. This parameter is valid only for BatchMatmul. After this parameter is enabled, only the Norm template is supported, and IterateNBatch needs to be called to implement multi-batch input and output. Values: false (default): disables the multi-batch function. true: enables the multi-batch function.
doMTE2Preload	Input	Sets the doMTE2Preload parameter. Whether to enable the preloading function in the M/N direction when MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is valid only for the MDL template. Values: 0 (default): disables the function. 1: enables preloading in the M direction. 2: enables preloading in the N direction. Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and DoubleBuffer is enabled in the M/N direction. The condition for full load in the M direction is that singleCoreK/baseK is less than or equal to stepKa, and that in the N direction is that singleCoreK/baseK is less than or equal to stepKb. For details about how to use this parameter, see Matmul operator sample for preloading in the M and N directions.
isVecND2NZ	Input	Sets the enVecND2NZ parameter. Whether to enable ND2NZ (converting data from ND format to NZ format) using vector. To enable this function, you need to set SetLocalWorkspace. Values: false (default): disables ND2NZ using the vector. true: enables ND2NZ using the vector. For Atlas inference product 's AI Core, when the Unified Buffer space is sufficient (Unified Buffer space is greater than twice the value of transLength of TCubeTiling), you are advised to enable this parameter for better data movement.
isPerTensor	Input	Sets the isPerTensor parameter. Whether quantization for matrix B is conducted per tensor or per channel in the scenario where matrix A's input type is half and matrix B's input type is int8_t. true: quantization conducted per tensor false: quantization conducted per channel
hasAntiQuantOffset	Input	Sets the hasAntiQuantOffset parameter. Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8_t.
enUnitFlag	Input	Sets the enUnitFlag parameter. Whether to enable the UnitFlag function to allow parallel execution of computation and data movement for performance improvement. By default, the function is enabled when the Norm and IBShare templates are used and disabled when the MDL template is used. Values: false: disables the UnitFlag function. true: enables the UnitFlag function.
isMsgReuse	Input	Sets the enableReuse parameter. SetSelfDefineData function directly transfers the computation data. If the SetSelfDefineData function is not called to set dataPtr, this parameter can only be set to the default value true. Values: true: passes computation data. Only a single value is supported. false: passes data address information stored on GM.
enableUBReuse	Input	Sets the enableUBReuse parameter. Whether to enable Unified Buffer reuse. When the Unified Buffer has sufficient capacity (its size is greater than four times the value of transLength of TCubeTiling), enabling this parameter divides the Unified Buffer into two non-overlapping regions. These two regions store the data for two consecutive Matmul iterations. With Unified Buffer reuse enabled, the data of the next iteration can be loaded into the second region. It no longer needs to wait for the previous iteration's Unified Buffer region to be released. This optimizes pipeline and improves overall performance. Values: true: enables Unified Buffer reuse. false: disables Unified Buffer reuse. For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported. For Atlas A2 training products / Atlas A2 inference products , this parameter is not supported. For Atlas inference product 's AI Core, this parameter is supported. For Atlas 200I/500 A2 inference products , this parameter is not supported.
enableL1CacheUB	Input	Sets the enableL1CacheUB parameter. Whether to cache Unified Buffer computing blocks in L1 Buffer. It is recommended that this parameter be used in scenarios where the MTE3 and MTE2 pipelines are frequently used in serial mode. Values: true: caches Unified Buffer computing blocks in L1 Buffer. false: does not cache Unified Buffer computing blocks in L1 Buffer. To cache Unified Buffer computing blocks in L1 Buffer, you must call SetMatmulConfigParams in the tiling implementation to set enableL1CacheUBIn to true. For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported. For Atlas A2 training products / Atlas A2 inference products , this parameter is not supported. For Atlas inference product 's AI Core, this parameter is supported. For Atlas 200I/500 A2 inference products , this parameter is not supported.
enableMixDualMaster	Input	Sets the enableMixDualMaster parameter. Whether to enable MixDualMaster (dual-master mode). Different from the MIX mode (including cube computation and vector computation) that drives the AIC to run using the message mechanism, the dual-master mode enables the AIC and AIV to run independently without depending on the message mechanism. The default value is false. This parameter can be set to true only in the following scenarios: The kernel function type is MIX, and the ratio of AIC cores to AIV cores is 1:1. The kernel function type is MIX, the ratio of AIC cores to AIV cores is 1:2, and the IBSHARE parameter is enabled for both matrix A and matrix B. Note that the following conditions must be met to enable MixDualMaster: The value of this parameter must be the same for all Matmul objects in the same operator. Matrix A, matrix B, and the bias can be moved in only from GM. Only the IterateAll API can be called to obtain the cube computation result and output it to the GlobalTensor. That is, the computation result is stored in the Global Memory address. The GetTensorC API cannot be called to obtain the result. For Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For Atlas inference product 's AI Core, this parameter is not supported. For Atlas 200I/500 A2 inference products , this parameter is not supported.
enableKdimReorderLoad	Input	Sets the enableKdimReorderLoad parameter. Whether to enable staggered loading of data on the K axis. During Matmul computation based on the same tiling parameters, if the left or right matrices of multiple cores are the same and stored in the global memory, multiple cores may access the same address at the same time to load matrix data, causing access conflicts and affecting performance. After this parameter is enabled, during multi-core Matmul computation, the multiple cores try to access different global memory addresses at the same time to reduce the probability of address access conflicts and improve performance. This parameter is supported only for the MDL template. You are advised to enable this parameter when the K axis is large and the left and right matrices are not fully loaded. For details, see operator sample for staggered data loading along the K axis. Values: false (default): disables the staggered data loading function on the K axis. true: enables the staggered data loading function on the K axis. For Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For Atlas inference product 's AI Core, this parameter is not supported. For Atlas 200I/500 A2 inference products , this parameter is not supported.

Returns

MatmulConfig structure

Restrictions

None

Example

      
           constexpr MatmulConfig MM_CFG = GetMDLConfig();
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);

Parent topic: Matmul Kernel APIs