GetIBShareNormConfig

Function Usage

Configures the parameters to obtain the user-defined IBShare template.

Prototype

1
__aicore__ constexpr MatmulConfig GetIBShareNormConfig(const bool intrinsicsLimit = false, const bool batchLoop = false, const bool isVecND2NZ = false, const BatchMode bmmMode = BatchMode::BATCH_LESS_THAN_L1, const bool isDoubleCache = false, const bool enUnitFlag = true)

Parameters

All parameters of this API are used to set the parameters of the MatmulConfig structure. Parameters corresponding to each other have the same functions.

Table 1 API parameters

Parameter

Input/Output

Description

intrinsicsLimit

Input

Sets the intrinsicsCheck parameter.

Whether to enable cyclic data move-in when the inner axis (last axis) of the left or right matrix on a single core is greater than or equal to 65535. For example, for the left matrix A [M, K], if singleCoreK of the inner axis on a single core is greater than 65535 and this parameter is set to true, data is moved in cyclically in the API. Values:

  • false (default): When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is not moved in cyclically.
  • true: When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is moved in cyclically.

batchLoop

Input

Sets the isNBatch parameter.

Whether to enable multi-batch input and output. This parameter is valid only for BatchMatmul. Values:

  • false (default): disables multi-batch input and output.
  • true: enables the multi-batch function.

isVecND2NZ

Input

Sets the enVecND2NZ parameter.

Whether to enable ND2NZ (converting data from ND format to NZ format) using the vector. To enable this function, you need to set SetLocalWorkspace. Values:

  • false (default): disables ND2NZ using the vector.
  • true: enables ND2NZ using the vector.

bmmMode

Input

Sets the batchMode parameter.

Relationship between the total amount of multi-batch data for input matrices A and B in a BatchMatmul operation and the size of L1 Buffer when the layout mode is set to NORMAL. Values:

  • BatchMode::BATCH_LESS_THAN_L1: Total amount of multi-batch data < Size of L1 Buffer
  • BatchMode::BATCH_LARGE_THAN_L1: Total amount of multi-batch data > Size of L1 Buffer
  • BatchMode::SINGLE_LARGE_THAN_L1: Total amount of single-batch data > Size of L1 Buffer

isDoubleCache

Input

Sets the enableDoubleCache parameter.

Whether to cache two blocks in L1 Buffer after the IBShare template is enabled. Note that the size of the base block must be controlled to prevent the size of the two blocks from exceeding the L1 Buffer size limit. The values are as follows:

  • false (default): caches one block in L1 Buffer.
  • true: caches two blocks in L1 Buffer.

enUnitFlag

Input

Sets the enUnitFlag parameter.

Whether to enable the unitflag function to allow parallel execution of computation and data movement for performance improvement. By default, the function is enabled when the Norm and IBShare templates are used and disabled when the MDL template is used. Values:

  • false: disables the unitflag function.
  • true: enables the unitflag function.

Availability

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
constexpr MatmulConfig MM_CFG = GetIBShareNormConfig();
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half> aType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half, true, LayoutMode::NONE, true> bType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> cType; 
typedef matmul::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float> biasType; 
Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, MM_CFG> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);
mm.SetTensorA(gm_a);
mm.SetTensorB(gm_b);
mm.SetBias(gm_bias);
mm.IterateAll(gm_c);