GetNormalConfig

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

Configures the parameters of the Norm template and obtains the custom Norm template. For details about the Norm template, see Table 1.

Prototype

      
           __aicore__ constexpr MatmulConfig GetNormalConfig(const bool intrinsicsLimit = false, const bool batchLoop = false, const bool isVecND2NZ = false, const BatchMode bmmMode = BatchMode::BATCH_LESS_THAN_L1, const bool isMsgReuse = true, const IterateOrder iterateOrder = IterateOrder::UNDEF, const ScheduleType scheduleType = ScheduleType::INNER_PRODUCT, const bool enUnitFlag = true, const bool enableMixDualMaster = false, const BatchOutMode bmmOutMode = BatchOutMode::SINGLE_BATCH)

Parameters

All parameters of this API are used to set the parameters of the MatmulConfig structure. The functions of the corresponding parameters are the same.

Table 1 API parameters

Parameter

Input/Output

Description

intrinsicsLimit

Input

Sets the intrinsicsCheck parameter.

Whether to enable cyclic data move-in from the Global Memory to L1 Buffer when the inner axis (last axis) of the left or right matrix on a single core is greater than or equal to 65535 (number of elements). For example, for the left matrix A [M, K], if singleCoreK of the inner axis on a single core is greater than 65535 and this parameter is set to true, data is moved in cyclically in the API. Values:

false (default): When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is not moved in cyclically.
true: When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is moved in cyclically.

batchLoop

Input

Sets the isNBatch parameter.

Whether to enable multi-batch input and output. This parameter is valid only for BatchMatmul. After this parameter is enabled, only the Norm template is supported, and IterateNBatch needs to be called to implement multi-batch input and output. Values:

false (default): disables the multi-batch function.
true: enables the multi-batch function.

isVecND2NZ

Input

Sets the enVecND2NZ parameter.

Whether to enable ND2NZ (converting data from ND format to NZ format) using vector. To enable this function, you need to set SetLocalWorkspace. Values:

false (default): disables ND2NZ using the vector.
true: enables ND2NZ using the vector.

For Atlas inference product 's AI Core, when the Unified Buffer space is sufficient (Unified Buffer space is greater than twice the value of transLength of TCubeTiling), you are advised to enable this parameter for better data movement.

bmmMode

Input

Sets the batchMode parameter. This parameter is used in the BatchMatmul scenario. For details about BatchMatmul, see Basic Functions of Batch Matmul.

Relationship between the total amount of multi-batch data for input matrices A and B in a BatchMatmul operation and the size of L1 Buffer when the layout type is set to Normal in the BatchMatmul scenario. Values:

BatchMode::BATCH_LESS_THAN_L1: Total amount of multi-batch data < Size of L1 Buffer
BatchMode::BATCH_LARGE_THAN_L1: Total amount of multi-batch data > Size of L1 Buffer
BatchMode::SINGLE_LARGE_THAN_L1: Total amount of single-batch data > Size of L1 Buffer

isMsgReuse

Input

Sets the enableReuse parameter.

SetSelfDefineData function directly transfers the computation data. If the SetSelfDefineData function is not called to set dataPtr, this parameter can only be set to the default value true. Values:

true: passes computation data. Only a single value is supported.
false: passes data address information stored on GM.

iterateOrder

Input

Sets the iterateOrder parameter.

Iteration sequence for Matmul to perform cube computation. The meaning of this parameter is the same as that of iterateOrder in Table 1. This parameter is valid only when ScheduleType is set to ScheduleType::OUTER_PRODUCT. Values:

           
                enum class IterateOrder {
    ORDER_M = 0,   // Offset to the M-axis direction and then to the N-axis direction.
    ORDER_N,       // Offset to the N-axis direction and then to the M-axis direction.
    UNDEF,         // Invalid currently.
};

Note: When the Norm template (Matmul scenario) and the MDL template are used, if IterateOrder is set to ORDER_M, the value of stepN in the TCubeTiling structure must be greater than 1. If IterateOrder is set to ORDER_N, the value of stepM in the TCubeTiling structure must be greater than 1.

For details about how to use this parameter, see Matmul operator sample for pipeline parallelism in the M and N directions.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products , this parameter is not supported.

scheduleType

Input

Sets the scheduleType parameter.

Matmul data movement mode. Values:

ScheduleType::INNER_PRODUCT (default): performs MTE1 cyclic movement in the K direction.
ScheduleType::OUTER_PRODUCT: performs MTE1 cyclic movement in the M or N direction. After being enabled, this parameter must be used together with IterateOrder.
Its configuration takes effect only in the Norm template (BatchMatmul and Matmul scenarios) and the MDL template.
- If the value of IterateOrder is set to ORDER_M, cyclic movement is performed in the N direction, that is, data in matrix B is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreN is greater than that of baseN.)
- If the value of IterateOrder is set to ORDER_N, cyclic movement is performed in the M direction, that is, data in matrix A is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreM is greater than that of baseM.)
- The cyclic movement in the M direction and N direction cannot be enabled at the same time.

Note:

In the Norm template (BatchMatmul scenario) or the MDL template, when singleCoreK is greater than baseK, ScheduleType::OUTER_PRODUCT cannot be enabled and the default mode must be used.
In the Matmul scenario of the Norm or MDL template, ScheduleType::OUTER_PRODUCT can be configured only in CUBE_ONLY mode (with only Cube computation).
This parameter can be set to ScheduleType::OUTER_PRODUCT only when the MDL template calls IterateAll for computation.
This parameter can be set to ScheduleType::OUTER_PRODUCT only when matrix C is output to GM.
For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products , this parameter is not supported.

enUnitFlag

Input

Sets the enUnitFlag parameter.

Whether to enable the UnitFlag function to allow parallel execution of computation and data movement for performance improvement. By default, the function is enabled when the Norm and IBShare templates are used and disabled when the MDL template is used. Values:

false: disables the UnitFlag function.
true: enables the UnitFlag function.

enableMixDualMaster

Input

Sets the enableMixDualMaster parameter.

Whether to enable MixDualMaster (dual-master mode). Different from the MIX mode (including cube computation and vector computation) that drives the AIC to run using the message mechanism, the dual-master mode enables the AIC and AIV to run independently without depending on the message mechanism. The default value is false. This parameter can be set to true only in the following scenarios:

The kernel function type is MIX, and the ratio of AIC cores to AIV cores is 1:1.
The kernel function type is MIX, the ratio of AIC cores to AIV cores is 1:2, and the IBSHARE parameter is enabled for both matrix A and matrix B.

Note that the following conditions must be met to enable MixDualMaster:

The value of this parameter must be the same for all Matmul objects in the same operator.
Matrix A, matrix B, and the bias can be moved in only from GM.
Only the IterateAll API can be called to obtain the cube computation result and output it to the GlobalTensor. That is, the computation result is stored in the Global Memory address. The GetTensorC API cannot be called to obtain the result.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products , this parameter is not supported.

bmmOutMode

Input

Reserved parameter

Returns

MatmulConfig structure

Restrictions

None

Example

      
           constexpr MatmulConfig MM_CFG = GetNormalConfig();
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);

Parent topic: Matmul Kernel APIs