Local output of single matrix multiplication
Overview
Single matrix multiplication partial output, also called partial output. As described in Basic Knowledge, one or more basic block calculations are performed in the K direction during one Iterate calculation. One basic block calculation is performed based on the input data of baseM x baseK and baseK x baseN to obtain the result of baseM x baseN. After the results of each basic block calculation are accumulated, the result of baseM x baseN obtained based on the input data of baseM x singleCoreK and singleCoreK x baseN is obtained, and the result is used as the final result of one iteration.
After the partial output function is enabled, the K-axis accumulation is not performed when the Iterate API is called. Only a single basic block calculation is performed. You can call the GetTensorC API to obtain the corresponding single-chip data and accumulate the data on the K axis.
Use Case
The matrix multiplication calculation result does not need to be accumulated. Only the calculation results baseM*baseN of baseM*baseK and baseK*baseN need to be output. For example, data of a single basic block calculation needs to be obtained first for dequantization, and then accumulation is performed to obtain a final result.
Restrictions
- This function applies only to MDL Template.
- Only the continuous write mode of the Iterate and GetTensorC APIs can be used to obtain the matrix multiplication result. The discontinuous write mode and the IterateAll API cannot be used to obtain the result. For details about the continuous write mode, see GetTensorC.
- This function does not support Matmul computation with a bias matrix. That is, the input bias matrix is not supported.
Examples
For a complete operator example, see operator sample for enabling the partial output function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// Configure the MDL profile and enable Partial Output. constexpr static MatmulConfigMode configMode = MatmulConfigMode::CONFIG_MDL; constexpr static MatmulFuncParams funcParams = { false, false, false, false, 0, IterateOrder::UNDEF, ScheduleType::INNER_PRODUCT, true, true, true /* isPartialOutput */ }; constexpr static MatmulConfig CFG_PARTIAL = GetMMConfig<configMode>(funcParams); Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_PARTIAL> mm; REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm); mm.Init(&tiling); mm.SetTensorA(gmA, isTransposeA); mm.SetTensorB(gmB, isTransposeB); while (mm.Iterate()) { mm.GetTensorC(tmpGmC[dstOffset], false, true); dstOffset += baseM * baseN; // Other operations } |