Batch Matmul Reusing Bias Matrix

Functions

In the Batch Matmul scenario, the Matmul API can compute multiple matrices C with the size of singleCoreM x singleCoreN at a time. When there is a bias input in the Batch Matmul scenario, the default bias input matrix contains the Batch axis, that is, the bias size is Batch x N. If the bias reuse function is enabled, only one bias matrix without the Batch axis needs to be input when the bias data used for each batch compute operation is the same. The bias matrix reuse function of Batch Matmul is disabled by default. You need to set the isBiasBatch parameter in MatmulConfig to false to enable it.

Figure 1 Bias compute with the batch axis

As shown in the preceding figure, in the scenario where the bias matrix is not reused in Batch Matmul, each time a matrix C with the size of singleCoreM x singleCoreN is computed, it is added to a bias matrix with the size 1 x singleCoreN. If the bias data used for multi-batch compute is the same, the same bias matrix can be reused for multi-batch compute. As shown in the following figure, when the SetBias API is called in this scenario, only one bias matrix with the size of 1 x singleCoreN needs to be set.

Figure 2 Bias reuse for compute

Use Case

The same bias matrix can be used for Matmul compute of each batch in Batch Matmul.

Restrictions

When the layout type of matrices A, B, and C is NORMAL, the batchMode parameter cannot be set to SINGLE_LARGE_THAN_L1. That is, in the bias reuse scenario, the total data size of matrices A and B in a single batch cannot exceed the size of the L1 buffer.

Examples

For a complete operator example, see BatchMatmul operator sample for reusing the bias.

      
           // Customize the MatmulConfig parameter and set isBiasBatch to false to enable the bias reuse function of Batch Matmul.
constexpr MatmulConfigMode configMode = MatmulConfigMode::CONFIG_NORM;
constexpr MatmulBatchParams batchParams = {
  false, BatchMode::BATCH_LESS_THAN_L1, false /* isBiasBatch */
};
constexpr MatmulConfig CFG_MM = GetMMConfig<configMode>(batchParams);
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_MM> mm;

REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); // Initialize the matmul object.
mm.SetTensorA(gm_a);    // Set the left matrix A.
mm.SetTensorB(gm_b);    // Set the right matrix B.
mm.SetBias(gm_bias);    // Set the bias. The matrix size is 1 x singleCoreN.
mm.IterateBatch(gm_c, batchA, batchB, false);
mm.End();

Parent topic: Feature Scenarios