Batch Matmul multiplexing bias matrix

Overview

In the Batch Matmul scenario, the Matmul API can compute multiple C matrices whose size is singleCoreM x singleCoreN at a time. When there is a bias input in the Batch Matmul scenario, the default bias input matrix contains the Batch axis, that is, the bias size is Batch x N. If the bias reuse function is enabled, only one bias matrix without the batch axis needs to be input when the bias data used for each batch calculation is the same. The bias matrix reuse function of Batch Matmul is disabled by default. You need to set isBiasBatch in MatmulConfig to false to enable this function.

Figure 1 Bias calculation with the batch axis

As shown in the preceding figure, in the scenario where the bias matrix is not reused in Batch Matmul, each time a C matrix with the size of singleCoreM x singleCoreN is calculated, the C matrix is added to the bias matrix with the size of 1 x singleCoreN. If the bias data used for computation of different batches is the same, the same bias matrix can be reused for multi-batch computation, as shown in the following figure. In this scenario, when the SetBias API is called, only a bias matrix with the size of 1 x singleCoreN needs to be set.

Figure 2 Calculation of the multiplexing bias

Use Case

The same bias matrix can be used for Matmul computation of each batch in Batch Matmul.

Restrictions

A. When the layout types of matrix B and matrix C are both NORMAL, the batchMode parameter cannot be set to SINGLE_LARGE_THAN_L1. That is, in the bias reuse scenario, the total data of matrix A and matrix B in a single batch cannot exceed the size of the L1 buffer.

Examples

For a complete operator example, see BatchMatmul operator sample for reusing the bias.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Customize the MatmulConfig parameter and set isBiasBatch to false to enable the bias reuse function of BatchMatmul.
constexpr MatmulConfigMode configMode = MatmulConfigMode::CONFIG_NORM;
constexpr MatmulBatchParams batchParams = {
  false, BatchMode::BATCH_LESS_THAN_L1, false /* isBiasBatch */
};
constexpr MatmulConfig CFG_MM = GetMMConfig<configMode>(batchParams);
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_MM> mm;

Initializing the matmul object in REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); //
mm.SetTensorA(gm_a);    // Set the left matrix A.
mm.SetTensorB(gm_b);    // Set the right matrix B.
mm.SetBias(gm_bias); // Set the bias. The matrix size is 1 x singleCoreN.
mm.IterateBatch(gm_c, batchA, batchB, false);
mm.End();