4:2 sparse matrix multiplication
Overview
4:2 sparse matrix multiplication, also called Sparse Matmul. In this scenario, the original left matrix A and right matrix B are sparse matrices, and at least two of every four elements in sparse matrix B are zero elements. Before performing Matmul computation, you need to perform 4:2 densification on matrix B. That is, filter out two zero elements from every four elements based on the original sparse matrix B to densify matrix B into a dense matrix. In the sparse Matmul scenario, the Matmul API is called to perform matrix multiplication between matrix A and matrix B after 4:2 densification. Sparse Matmul can skip zero elements in sparse matrix B and perform data transfer, storage, and computing only on non-zero elements. This reduces the memory usage and computing workload during matrix multiplication computing and improves performance.
Procedure
- Data preprocessing
In the data preparation phase, you need to densify matrix B that is a sparse matrix. For details about the densification process, see Densification Algorithms. After the densification process ends, a right matrix B and an index matrix index that are obtained after 4:2 densification are obtained. The right matrix B and the index matrix index that are obtained after densification are used as calculation input of the Sparse Matmul scenario.
Figure 1 4:2 densification process of the original sparse matrix B
During densification, two 2-bit indexes are generated in the index matrix for every four elements in sparse matrix B. Each index points to the relative position of the corresponding non-zero element. For details, see the densification algorithm description. The data type of the index matrix generated during densification is int2. Before being loaded to Matmul, the index matrix needs to be combined into the int8 data type. The index matrix is arranged in a reverse order in an int8 address. For example, the index matrix 1 2 0 1 0 2 1 0 is arranged as 1 0 2 1 0 1 2 0 in the address, where 1 0 2 1 (corresponding to the first 4 bits 1 2 0 1 of the index matrix) is an int8, and 0 1 2 0 (corresponding to the last 4 bits 0 2 1 0 of the index matrix) is an int8.
- Enabling the Sparse Matmul ScenarioOn the host, before obtaining tiling, call the SetSparse API to enable the sparse Matmul scenario.
1 2 3 4 5 6 7 8 9 10 11
auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8); tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8); tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32); tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32); // Enable the Sparse Matmul scenario. tiling.SetSparse(true); ... // Other implementation content optiling::TCubeTiling tilingData; int ret = tiling.GetTiling(tilingData);
- Create a Matmul object.When creating a Matmul object on the kernel side, MatmulType is used to define the parameter types of A, C, and Bias, including the logical memory location, data format, and data type. SparseMatmulType is used to define the parameter types of matrix B, including the logical memory location of matrix B, logical memory location of the index matrix, data format, and data type.
1 2 3 4 5 6 7 8
#include "lib/matmul_intf.h" using A_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, ATYPE, false>; // Use SparseMatmulType to define the parameter types of matrix B. using B_TYPE = AscendC::SparseMatmulType<AscendC::TPosition::GM, AscendC::TPosition::GM, CubeFormat::ND, BType, true>; using C_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, CType>; using BIAS_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, BiasType>; AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_MDL> mm;
- Setting the Index MatrixThe index matrix generated during the densification process is passed through the SetSparseIndex API.
mm.SetTensorA(gm_a); // Set the left matrix A. mm.SetTensorB(gm_b); // Set the right matrix B. mm.SetSparseIndex(gm_index); // Pass the index matrix generated during the densification process. mm.SetBias(gm_bias); // Set the bias.
- Execute the matrix multiplication operation.On the kernel side, the matrix multiplication is completed based on the index matrix loaded in Step 4. The Matmul API internally densifies matrix A, that is, selects two elements from every four elements in matrix A according to the index matrix for computation.
1 2 3 4 5 6
// Call the Iterate and GetTensorC or IterateAll APIs to complete the matrix multiplication. while (mm.Iterate()) { mm.GetTensorC(gm_c); } // mm.IterateAll(gm_c); mm.End();
Parameters
|
Parameter |
Description |
|---|---|
|
POSITION |
Logical memory location. The matrix B can only be set to TPosition::GM. |
|
INDEX_POSITION |
Logical memory location of the index matrix. This parameter can only be set to TPosition::GM. |
|
CubeFormat |
Physical layout format of data. For details, see data format. |
|
TYPE |
The B matrix can only be set to the int8_t data type. |
|
ISTRANS |
Whether to enable the matrix transpose function. Currently, this parameter can only be set to true, indicating that the matrix transpose function is enabled. |
|
LAYOUT |
Data layout format. In the Sparse Matmul scenario, only the value LAYOUT::NONE is supported. NONE (default): BatchMatmul is not used. |
|
IBSHARE |
Whether to enable IBShare (IntraBlock Share). The IBShare function is to reuse the same matrix A or B data in the L1 buffer. When IBShare is enabled for both matrix A and matrix B, it indicates that both matrix A and matrix B in the L1 buffer are reused. In the Sparse Matmul scenario, only the value false is supported, indicating that IBShare is disabled. |
Use Case
The left matrix A is a sparse matrix, and the right matrix B is a matrix after 4:2 densification.
Restrictions
- In this scenario, only the pure Cube mode (only matrix computation) in the MDL template is supported.
- The index matrix transferred by the SetSparseIndex API supports only the int8 data type and NZ data layout.
- In the original sparse matrix B, ensure that there are at most two non-zero elements in every four elements. That is, there are at least two zero elements. If there are three or more non-zero elements, only the first two non-zero elements are used.
- M, K, N cannot be set to 0.