MatmulConfig

Configures Matmul template information and related parameters. If the parameter is not set, the Norm template is enabled by default. For details, see template features. For details, see Table 2. MatmulConfig can be defined in the following ways:

Table 1 Template features

Template

Implementation

Advantage

Applicable Scenario

Norm

L1 can cache multiple base blocks. MTE2 moves base blocks from GM to L1 for multiple times, with one base block moved each time. The moved base blocks are not cleared. For example, if depthA1 in the tiling structure is set to 6, six base blocks of matrix A are moved to L1, one base block is moved at a time, and MTE2 moves blocks for six times.

The MTE1 pipeline can be started in advance, because the subsequent computation of MTE1 can be performed after one base block is moved.

The Norm template is enabled by default.

MDL and SpecialMDL

L1 can cache multiple base blocks. The data movement of MTE2 from GM to L1 is a one-time "large-packet" movement. For example, if depthA1 in the tiling structure is set to 6, six base blocks of matrix A are moved to L1 at a time, and MTE2 moves blocks once. For details about the differences between the MDL template and the SpecialMDL template, see Table 2.

In common large-shape scenarios, this can reduce MTE2 cyclic movement to improve performance.

Large-shape scenarios

IBShare

In the MIX scenario, when the GM addresses of matrix A or matrix B are the same, L1 Buffer is shared to reduce MTE2 movement.

This reduces MTE2 movement and improves performance.

The GM addresses of matrix A or matrix B of multiple AIVs are the same in the MIX scenario.

Note: To use the IBShare template, the matrix A or matrix B reused by multiple AIVs must be fully loaded on L1 Buffer.

BasicBlock

If there is no tail block and the base block size is fixed, the GetBasicConfig API can be used to configure the size of input base blocks, and fix the size of the matrix moved by MTE1 each time and the size of the matrix computed by matrix multiplication each time to reduce the parameter computation workload.

This reduces the parameter computation overhead during MTE1 matrix movement and matrix multiplication computation.

There is no tail block, and the size of the base block (baseM, baseN) is determined.

Table 2 MatmulConfig parameters

Parameter

Description

Supported Templates: Norm, MDL, SpecialMDL, IBShare, and BasicBlock

doNorm

Whether to enable the Norm template. Values:

  • true: enables the Norm template.
  • false: disables the Norm template.

If no value is specified, the Norm template is enabled by default.

Norm

doBasicBlock

Whether to enable the BasicBlock template. Values:

  • true: enables the BasicBlock template.
  • false: disables the BasicBlock template.

When GetBasicConfig is called to obtain the BasicBlock template, this parameter is set to true. Notes:

  • Currently, the BasicBlock template supports only matrices A and B whose input is of the half, bfloat16_t, or float type rather than the int8_t or int4_t type.
  • Currently, the BasicBlock template does not support matrix A in scalar data or vector data format.
  • Currently, the BasicBlock template does not support the ScheduleType::OUTER_PRODUCT data movement mode.

For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For the Atlas inference product 's AI Core, this parameter cannot be set to true.

For the Atlas 200I/500 A2 inference products , this parameter cannot be set to true.

BasicBlock

doMultiDataLoad

Whether to enable the MDL template. Values:

  • true: enables the MDL template.
  • false: disables the MDL template.

MDL

basicM

Equivalent to the baseM parameter in the TCubeTiling structure. It indicates the length of the M axis of a base block during Matmul computation. The unit is element.

BasicBlock

basicN

Equivalent to the baseN parameter in the TCubeTiling structure. It indicates the length of the N axis of a base block during Matmul computation. The unit is element.

BasicBlock

basicK

Equivalent to the baseK parameter in the TCubeTiling structure. It indicates the length of the K axis of a base block during Matmul computation. The unit is element.

BasicBlock

intrinsicsCheck

Whether to enable cyclic data move-in from the Global Memory to L1 Buffer when the inner axis (last axis) of the left or right matrix on a single core is greater than or equal to 65535 (number of elements). For example, for the left matrix A [M, K], if singleCoreK of the inner axis on a single core is greater than 65535 and this parameter is set to true, data is moved in cyclically in the API. Values:

  • false (default): When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is not moved in cyclically.
  • true: When the inner axis of the left or right matrix on a single core is greater than or equal to 65535, data is moved in cyclically.

All templates

isNBatch

Whether to enable multi-batch input and output. This parameter is valid only for BatchMatmul. After this parameter is enabled, only the Norm template is supported, and IterateNBatch needs to be called to implement multi-batch input and output. Values:

  • false (default): disables the multi-batch function.
  • true: enables the multi-batch function.

Norm

enVecND2NZ

Whether to enable ND2NZ (converting data from ND format to NZ format) using vector. To enable this function, you need to set SetLocalWorkspace. Values:

  • false (default): disables ND2NZ using the vector.
  • true: enables ND2NZ using the vector.

For Atlas inference product 's AI Core, when the Unified Buffer space is sufficient (Unified Buffer space is greater than twice the value of transLength of TCubeTiling), you are advised to enable this parameter for better data movement.

All templates

doSpecialBasicBlock

Whether to enable the SpecialBasicBlock template. Values:

  • true: enables the SpecialBasicBlock template.
  • false: disables the SpecialBasicBlock template.

It is also a BasicBlock template, but it eliminates scalar computation of overhead.

Reserved parameter

doMTE2Preload

Whether to enable the preloading function in the M/N direction when MTE2 pipeline gap and the M/N value are large. After this function is enabled, the MTE2 pipeline gap is reduced and the performance is improved. The preloading function is valid only for the MDL template. Values:

  • 0 (default): disables the function.
  • 1: enables preloading in the M direction.
  • 2: enables preloading in the N direction.

Note: When preloading in the M/N direction is enabled, ensure that the data is fully loaded in the K direction and DoubleBuffer is enabled in the M/N direction. The condition for full load in the M direction is that singleCoreK/baseK is less than or equal to stepKa, and that in the N direction is singleCoreK/baseK is less than or equal to stepKb.

For details about how to use this parameter, see Matmul operator sample for preloading in the M and N directions.

MDL

singleCoreM

Shape size of a single core on the M axis, in elements.

Reserved parameter

singleCoreN

Shape size of a single core in the N axis, in elements.

Reserved parameter

singleCoreK

Shape size of a single core in the K axis, in elements.

Reserved parameter

stepM

A multiple of baseM of the left matrix in the bufferM direction buffered in A1.

Reserved parameter

stepN

A multiple of baseN of the right matrix in the bufferN direction buffered in B1.

Reserved parameter

baseMN

Size of baseM × baseN.

Reserved parameter

singleCoreMN

Size of singleCoreM × singleCoreN.

Reserved parameter

enUnitFlag

Whether to enable the UnitFlag function to allow parallel execution of computation and data movement for performance improvement. By default, the function is enabled when the Norm and IBShare templates are used and disabled when the MDL template is used. Values:

  • false: disables the UnitFlag function.
  • true: enables the UnitFlag function.

For details about how to use this parameter, see matmul_unitflag operator sample.

MDL, Norm, and IBShare

isPerTensor

Whether quantization for matrix B is conducted per tensor or per channel in the scenario where matrix A's input type is half and matrix B's input type is int8_t.

  • true: quantization conducted per tensor
  • false: quantization conducted per channel

MDL and SpecialMDL

hasAntiQuantOffset

Whether to use the offset coefficient when matrix B quantization is enabled in the scenario where matrix A's input type is half and matrix B's input type is int8_t.

MDL and SpecialMDL

doIBShareNorm

Whether to enable the IBShare template. Values:

  • false: disables the IBShare template.
  • true: enables the IBShare template.

IBShare is used to reuse the same matrix A or B data on L1. After IBShare is enabled, repeated data movement to L1 can be avoided for data reuse.

IBShare

doSpecialMDL

Whether to enable the SpecialMDL template. Values:

  • true: enables the SpecialMDL template.
  • false: disables the SpecialMDL template.

Special scenario of the MDL template: When the MDL template is not fully loaded in the Matmul K direction (singleCoreK/baseK > stepKb), stepN can be set to 1 by default. After the SpecialMDL template is enabled, stepN can be set to 2.

Note: When the SpecialMDL template is enabled, the value of doMultiDataLoad must be false.

SpecialMDL

enableInit

Whether to enable the Init function. If the Init function is disabled, the constant propagation effect can be improved and the performance can be optimized. By default, it is enabled.

  • false: disables the Init function.
  • true: enables the Init function.

All templates

batchMode

Relationship between the total amount of multi-batch data for input matrices A and B in a BatchMatmul operation and the size of L1 Buffer when the layout type is set to Normal in the BatchMatmul scenario. Values:

  • BatchMode::BATCH_LESS_THAN_L1: Total amount of multi-batch data < Size of L1 Buffer
  • BatchMode::BATCH_LARGE_THAN_L1: Total amount of multi-batch data > Size of L1 Buffer
  • BatchMode::SINGLE_LARGE_THAN_L1: Total amount of single-batch data > Size of L1 Buffer

Norm

enableEnd

Whether to call the End function during Matmul computation. This parameter can be used to optimize performance. Values:

  • true (default): The End function needs to be called during Matmul computation.
  • false: The End function does not need to be called. The code related to End processing is deleted during compilation to optimize performance. For example, if the End function does not need to be called in the asynchronous scenario, set this parameter to false.

All templates

enableGetTensorC

Whether to call the GetTensorC function during Matmul computation. This parameter can be used to optimize performance. Values:

  • true (default): The GetTensorC function needs to be called during Matmul computation.
  • false: The GetTensorC function does not need to be called. The code related to GetTensorC processing is deleted during compilation to optimize performance.

All templates

enableSetOrgShape

Whether to call the SetOrgShape function during Matmul computation. This parameter can be used to optimize performance. Values:

  • true (default): The SetOrgShape function needs to be called during Matmul computation.
  • false: The SetOrgShape function does not need to be called. The code related to SetOrgShape processing is deleted during compilation to optimize performance.

All templates

enableSetBias

Whether to compute bias. This parameter can be used to optimize performance. Values:

  • true: enables bias computation (default value). If the input contains bias, data with bias is moved and computed during implementation.
  • false: disables bias computation. The code related to bias processing is deleted during compilation to optimize performance.

MDL

enableSetTail

Whether to call the SetTail function during Matmul computation. This parameter can be used to optimize performance. Values:

  • true (default): The SetTail function needs to be called during Matmul computation.
  • false: The SetTail function does not need to be called. The code related to SetTail processing is deleted during compilation to optimize performance.

All templates

enableQuantVector

Whether to call the SetQuantVector and SetQuantScalar functions during Matmul computation. This parameter can be used to optimize performance. Values:

  • true (default): The SetQuantVector and SetQuantScalar functions need to be called during Matmul computation.
  • false: The SetQuantVector and SetQuantScalar functions do not need to be called. The code related to SetQuantVector and SetQuantScalar processing is deleted during compilation to optimize performance.

All templates

enableSetDefineData

Whether to enable the setting of information such as the computation data required by the callback function or the data address stored on GM when MatmulCallBack (custom callback function) is enabled.

Values:

  • true: The setting is allowed (default value).
  • false: The setting is not allowed. The code related to SetSelfDefineData processing is deleted during compilation to optimize performance.

MDL

iterateMode

Iteration mode, used to optimize the Matmul computation overhead. Specifically, it is used for the optimization through Iterate APIs (including Iterate, IterateAll, IterateBatch, and IterateNBatch). When a mode is enabled, only one Iterate API corresponding to the mode is called during the Matmul computation, and the code related to other Iterate APIs is deleted during compilation to optimize performance. This parameter is of the IterateMode type. Values:

  • ITERATE_MODE_NORMAL: For Iterate APIs, only Iterate is called during Matmul computation.
  • ITERATE_MODE_ALL: For Iterate APIs, only IterateAll is called during Matmul computation.
  • ITERATE_MODE_BATCH: For Iterate APIs, only IterateBatch is called during Matmul computation.
  • ITERATE_MODE_N_BATCH: For Iterate APIs, only IterateNBatch is called during Matmul computation.
  • ITERATE_MODE_DEFAULT: default value. The number of Iterate APIs to be called is not limited, and the optimization of the computation overhead is disabled.

For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas 200I/500 A2 inference products , this parameter is not supported.

All templates

enableReuse

Whether dataPtr in the callback function set by SetSelfDefineDatafunction directly transfers the computation data. If the SetSelfDefineData function is not called to set dataPtr, this parameter can only be set to the default value true. Values:

  • true: passes computation data. Only a single value is supported.
  • false: passes data address information stored on GM.

Norm and MDL

enableUBReuse

Whether to enable Unified Buffer reuse. When the Unified Buffer has sufficient capacity (its size is greater than four times the value of transLength of TCubeTiling), enabling this parameter divides the Unified Buffer into two non-overlapping regions. These two regions store the data for two consecutive Matmul iterations. With Unified Buffer reuse enabled, the data of the next iteration can be loaded into the second region. It no longer needs to wait for the previous iteration's Unified Buffer region to be released. This optimizes pipeline and improves overall performance. Values:

  • true: enables Unified Buffer reuse.
  • false: disables Unified Buffer reuse.

For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is not supported.

For Atlas inference product 's AI Core, this parameter is supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

MDL

enableL1CacheUB

Whether to cache Unified Buffer computing blocks in L1 Buffer. It is recommended that this parameter be used in scenarios where the MTE3 and MTE2 pipelines are frequently used in serial mode. Values:

  • true: caches Unified Buffer computing blocks in L1 Buffer.
  • false: does not cache Unified Buffer computing blocks in L1 Buffer.

To cache Unified Buffer computing blocks in L1 Buffer, you must call SetMatmulConfigParams in the tiling implementation to set enableL1CacheUBIn to true.

For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is not supported.

For Atlas inference product 's AI Core, this parameter is supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

MDL

intraBlockPartSum

Whether to enable the accumulation of a single computation result (matrix slices with the size of baseM × baseN) of two AIV cores on L0C Buffer in the case of fused vector and cube computation on the separated architecture. Values:

  • false (default): The compute results of two AIV cores are not accumulated on L0C Buffer.
  • true: The compute results of two AIV cores are accumulated on L0C Buffer.

For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas 200I/500 A2 inference products , this parameter is not supported.

Norm

IterateOrder

Iteration sequence for Matmul to perform cube computation. The meaning of this parameter is the same as that of iterateOrder in Table 1. This parameter is valid only when ScheduleType is set to ScheduleType::OUTER_PRODUCT. Values:

1
2
3
4
5
enum class IterateOrder {
    ORDER_M = 0,   // Offset to the M-axis direction and then to the N-axis direction.
    ORDER_N,       // Offset to the N-axis direction and then to the M-axis direction.
    UNDEF,         // Invalid currently.
};

Note: When the Norm template (Matmul scenario) and the MDL template are used, if IterateOrder is set to ORDER_M, the value of stepN in the TCubeTiling structure must be greater than 1. If IterateOrder is set to ORDER_N, the value of stepM in the TCubeTiling structure must be greater than 1.

For details about how to use this parameter, see Matmul operator sample for pipeline parallelism in the M and N directions.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

Norm and MDL

scheduleType

Matmul data movement mode. Values:

  • ScheduleType::INNER_PRODUCT (default): performs MTE1 cyclic movement in the K direction.
  • ScheduleType::OUTER_PRODUCT: performs MTE1 cyclic movement in the M or N direction. After being enabled, this parameter must be used together with IterateOrder.
    Its configuration takes effect only in the Norm template (BatchMatmul and Matmul scenarios) and the MDL template.
    • If the value of IterateOrder is set to ORDER_M, cyclic movement is performed in the N direction, that is, data in matrix B is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreN is greater than that of baseN.)
    • If the value of IterateOrder is set to ORDER_N, cyclic movement is performed in the M direction, that is, data in matrix A is moved in parallel using MTE1. (The performance may be improved when the value of singleCoreM is greater than that of baseM.)
    • The cyclic movement in the M direction and N direction cannot be enabled at the same time.

Note:

  • In the Norm template (BatchMatmul scenario) or the MDL template, when singleCoreK is greater than baseK, ScheduleType::OUTER_PRODUCT cannot be enabled and the default mode must be used.
  • In the Matmul scenario of the Norm or MDL template, ScheduleType::OUTER_PRODUCT can be configured only in CUBE_ONLY mode (with only Cube computation).
  • This parameter can be set to ScheduleType::OUTER_PRODUCT only when the MDL template calls IterateAll for computation.
  • This parameter can be set to ScheduleType::OUTER_PRODUCT only when matrix C is output to GM.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

Norm and MDL

enableDoubleCache

Whether to cache two blocks in L1 Buffer after the IBShare template is enabled. Values:

  • false (default): caches one block in L1 Buffer.
  • true: caches two blocks in L1 Buffer.

Note: If this parameter is set to true, the base block size must be controlled to ensure that the cached data blocks do not exceed the L1 Buffer capacity.

IBShare

isBiasBatch

Whether the bias size includes batch axes in the BatchMatmul scenario. Values:

  • true (default): The bias size involves batch axes. The bias size is Batch × N.
  • false: The bias size does not involve batch axes. The bias size is N. The bias is reused in the BatchMatmul computation.

    Note: In the BatchMode::SINGLE_LARGE_THAN_L1 scenario, this parameter can only be set to true.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter cannot be set to false.

For Atlas 200I/500 A2 inference products s, this parameter cannot be set to false.

Norm

enableStaticPadZeros

Whether to automatically pad zeros based on the sizes of singleM, singleN, singleK, baseM, baseN, and baseK when the static tiling parameters are used and the left and right matrices are moved to L1 Buffer. For details about the static tiling parameters, see GetMatmulApiTiling.

Only the ND2NZ format of the GM input supports padding zeros. In other scenarios, you need to pad zeros manually. Values:

  • false (default): does not pad zeros automatically during data movement. You need to pad zeros manually.
  • true: automatically pads zeros based on the sizes of constant singleM/singleN/singleK and baseM/baseN/baseK during data movement.

Norm and MDL

isPartialOutput

Whether to enable the PartialOutput function. This parameter controls how Matmul computes and outputs base blocks along the K axis. In other words, this parameter determines whether to accumulate the partial results along the K axis when Matmul runs one Iterate step. Values:

  • true: enables the PartialOutput function. The K-axis partial results computed in a single Iterate computation are not accumulated. Each Matmul iteration outputs a local matrix fragment of size baseM × baseN that corresponds to the current baseK slice.
  • false: disables the PartialOutput function. The K-axis partial results computed in a single Iterate computation are accumulated. Each Matmul iteration outputs a matrix fragment of size baseM × baseN that corresponds to the current SingleCoreK slice.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

MDL

enableMixDualMaster

Whether to enable MixDualMaster (dual-master mode). Different from the MIX mode (including cube computation and vector computation) that drives the AIC to run using the message mechanism, the dual-master mode enables the AIC and AIV to run independently without depending on the message mechanism. The default value is false. This parameter can be set to true only in the following scenarios:

  • The kernel function type is MIX, and the ratio of AIC cores to AIV cores is 1:1.
  • The kernel function type is MIX, the ratio of AIC cores to AIV cores is 1:2, and the IBSHARE parameter is enabled for both matrix A and matrix B.

Note that the following conditions must be met to enable MixDualMaster:

  • The value of this parameter must be the same for all Matmul objects in the same operator.
  • Matrix A, matrix B, and the bias can be moved in only from GM.
  • Only the IterateAll API can be called to obtain the cube computation result and output it to the GlobalTensor. That is, the computation result is stored in the Global Memory address. The GetTensorC API cannot be called to obtain the result.

For details about how to use this parameter, see operator sample for enabling the active-active mode.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

Norm

isA2B2Shared

Whether to enable the global management of A2 and B2, that is, whether all Matmul objects share the double buffering mechanism of A2 and B2. As this is a global configuration, the parameter values for all Matmul objects must be the same. When it is enabled, the base block sizes of matrix A and matrix B cannot exceed 32 KB.

Values:

  • true: enabled
  • false (default): disabled

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

When this parameter is set to true, you are advised to set enUnitFlag to true so that the data transfer and computation pipeline can be performed in parallel mode, improving performance. For details about the example of using this parameter, see global management sample of Matmul A2 and B2.

Norm and MDL

isEnableChannelSplit

Whether to enable the channel_split function. In normal cases, the fractal size of the matrix C in CubeFormat::NZ format computed by Matmul is 16 × 16. Assume that the number of fractals is x. The channel_split function is used to obtain the fractal size of matrix C as 16 × 8, and the number of fractals changes to 2x. Note that this parameter can be enabled only when the format of matrix C computed by Matmul is CubeFormat::NZ, the type is float, and the output is to the global memory. Values:

  • false (default): The channel_split function is disabled, and the output fractal size is 16 × 16.
  • true: The channel_split function is enabled, and the output fractal size is 16 × 8.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

All templates

enableKdimReorderLoad

Whether to enable staggered loading of data on the K axis. During Matmul computation based on the same tiling parameters, if the left or right matrices of multiple cores are the same and stored in the global memory, multiple cores may access the same address at the same time to load matrix data, causing access conflicts and affecting performance. After this parameter is enabled, during multi-core Matmul computation, the multiple cores try to access different global memory addresses at the same time to reduce the probability of address access conflicts and improve performance. This parameter is supported only for the MDL template. You are advised to enable this parameter when the K axis is large and the left and right matrices are not fully loaded. For details, see operator sample for staggered data loading along the K axis. Values:

  • false (default): disables the staggered data loading function on the K axis.
  • true: enables the staggered data loading function on the K axis.

For Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

MDL

isCO1Shared

Whether to enable the CO1 memory sharing. This parameter and sharedCO1BufferSize specify the number of blocks allocated to CO1. The number of data blocks cached in CO1 must not exceed that of blocks allocated to CO1, that is, the number of results computed by Iterate that are not obtained by GetTensorC must not exceed the number of blocks allocated to CO1. As this parameter is a global configuration, the parameter value for all Matmul objects must be the same. Values:

  • true: enables CO1 memory sharing.
  • false (default): disables CO1 memory sharing.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas 200I/500 A2 inference products s, this parameter is not supported.

Norm and IBShare

sharedCO1BufferSize

Size of a shared buffer of CO1. The value is of the uint32_t type and can be 32*1024, 64*1024, or 128*1024.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For the Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

Norm and IBShare

bmmOutMode

Reserved parameter

Reserved parameter

enableL1BankConflictOptimise

Whether to enable bank conflict optimization on L1. The Tiling module determines whether this parameter can be enabled by calling EnableL1BankConflictOptimise. Combined with TilingKey, the Kernel inserts the corresponding code path when this optimization is enabled. When this parameter is enabled, for MatMul operations using identical tiling parameters, the A and B matrices and the ScaleA and ScaleB matrices in MxMatmul scenarios are no longer allocated contiguously in L1 Buffer. In DoubleBuffer scenario, the data used for parallel computation is allocated in two separate regions of L1 Buffer: upper half and lower half. In non-DoubleBuffer scenario, data is allocated in the upper half of L1 Buffer. Bias is always allocated in the upper half of L1 Buffer. In vector quantization/dequantization scenarios, the quantization coefficients are allocated in the lower half of L1 Buffer. Values:

  • false (default): disables bank conflict optimization on L1.
  • true: enables bank conflict optimization on L1.

For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.

For Atlas A2 training products / Atlas A2 inference products , this parameter is not supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

MDL

enableRelu

Whether a ReLU (Rectified Linear Unit) activation function is applied to the output matrix C after the matrix multiplication is completed. After this function is enabled, the negative value in the output matrix is corrected to 0. Values:

  • false(default): disables the ReLU activation function for output matrix C.
  • true: enables the ReLU activation function for output matrix C.

For Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas inference product 's AI Core, this parameter is not supported.

For Atlas 200I/500 A2 inference products s, this parameter is not supported.

All templates