Partial Output for a Single Matrix Multiplication

Overview

Partial output of a single matrix multiplication is also called Partial Output. As described in the basics, one or more basic block computations are performed in the K direction during an Iterate computation. Each basic block computation processes input data of baseM x baseK and baseK x baseN to produce a baseM x baseN result. When all results from these computations are accumulated, the final baseM × baseN output is obtained. This output is computed from input data of baseM × singleCoreK and singleCoreK × baseN, and acts as the final result of one Iterate.

After the Partial Output function is enabled, the computation results along the K-axis will not be accumulated when the Iterate API is called, and only a single basic block computation is performed. You can call the GetTensorC API to obtain the corresponding slice of data and accumulate the data along the K axis.

Figure 1 Computation when the Partial Output function is disabled

Figure 2 Computation when the Partial Output function is enabled

Application Scenarios

The matrix multiplication results do not need to be accumulated. Only the result of baseM x baseN, which is computed based on baseM x baseK and baseK x baseN, needs to be output. For example, data of each single basic block computation needs to be obtained first for dequantization, and then accumulation is performed to obtain a final result.

Restrictions

This function applies to the MDL template only.
Only the continuous write mode of the Iterate and GetTensorC APIs can be used to obtain the matrix multiplication results. The discontinuous write mode and the IterateAll API cannot be used to obtain the results. For details about the continuous write mode, see GetTensorC.
This function does not support Matmul computation with a bias matrix. In other words, no bias matrix can be input.

Example

For a complete operator example, see operator sample for enabling the partial output function.

      
       
         
         
           // Configure the MDL template and enable Partial Output.
constexpr static MatmulConfigMode configMode = MatmulConfigMode::CONFIG_MDL;
constexpr static MatmulFuncParams funcParams = {
  false, false, false, false, 0, IterateOrder::UNDEF, ScheduleType::INNER_PRODUCT, true, true,
  true /* isPartialOutput */
};
constexpr static MatmulConfig CFG_PARTIAL = GetMMConfig<configMode>(funcParams);
Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_PARTIAL> mm;
REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm);
mm.Init(&tiling);
mm.SetTensorA(gmA, isTransposeA);
mm.SetTensorB(gmB, isTransposeB);
while (mm.Iterate()) {
    mm.GetTensorC(tmpGmC[dstOffset], false, true);
    dstOffset += baseM * baseN;
    // Other operations.
}

          

        

      
     

Parent topic: Feature Scenarios