4:2 Sparse Matrix Multiplication

Overview

4:2 sparse matrix multiplication is also called Sparse Matmul. In this scenario, the original input left matrix A and right matrix B are sparse matrices, and at least two of every four elements in sparse matrix B are zero elements. Before performing Matmul computation, you need to perform 4:2 densification on matrix B, or in other words, filter out two zero elements from every four elements in the original sparse matrix B to densify matrix B. In the Sparse Matmul scenario, the Matmul API is called to perform matrix multiplication between matrix A and matrix B after 4:2 densification. Sparse Matmul can skip zero elements in sparse matrix B and move, store, and compute non-zero elements only. This reduces memory usage and computation workload during matrix multiplication, thus enhancing performance.

Implementation Procedure

Data preprocessing
During data preparation phase before computation, you need to densify matrix B, which is originally a sparse matrix. For details about the densification process, see the dense algorithm description. After the densification process ends, the right matrix B after 4:2 densification and an index matrix index are obtained, both of which are used as computation inputs for the Sparse Matmul scenario.

Figure 1 4:2 densification process of the original sparse matrix B

During densification, for every four elements in the sparse matrix B, two 2-bit indexes are generated in the index matrix index. Each index indicates the relative location of the corresponding non-zero element. For details, see the dense algorithm description. The data type of the index matrix generated during densification is int2. Before being loaded to Matmul, the index matrix needs to be converted to int8 data type. The index matrix is arranged in a reverse order in an int8 address. For example, the index matrix 1 2 0 1 0 2 1 0 is arranged as 1 0 2 1 0 1 2 0 in the address, where 1 0 2 1 (corresponding to the first 4 bits 1 2 0 1 of the index matrix) is an int8, and 0 1 2 0 (corresponding to the last 4 bits 0 2 1 0 of the index matrix) is an int8.

Enable the Sparse Matmul scenario.

Before obtaining tiling parameters on the host, you need to call the SetSparse API to enable the Sparse Matmul scenario.

         
              auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
matmul_tiling::MatmulApiTiling tiling(ascendcPlatform); 
tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8); 
tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT8);  
tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32);
tiling.SetBiasType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_INT32);  
// Enable the Sparse Matmul scenario.
tiling.SetSparse(true);
... // Other implementation details.
optiling::TCubeTiling tilingData;   
int ret = tiling.GetTiling(tilingData);

Create a Matmul object.

When creating a Matmul object on the kernel, MatmulType is used to define the parameter types of A, C, and Bias, including the logical memory location, data format, and data type. SparseMatmulType is used to define the parameter type of matrix B, including the logical memory location of matrix B, logical memory location of the index matrix, data format, and data type.

         
              #include "lib/matmul_intf.h"

using A_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, ATYPE, false>;
// Use SparseMatmulType to define the parameter type information about matrix B.
using B_TYPE = AscendC::SparseMatmulType<AscendC::TPosition::GM, AscendC::TPosition::GM, CubeFormat::ND, BType, true>;
using C_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, CType>;
using BIAS_TYPE =  AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, BiasType>;
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, CFG_MDL> mm;

Set the index matrix.

The SetSparseIndex API is used to pass the index matrix generated during the densification process.

mm.SetTensorA(gm_a);    // Set the left matrix A.
mm.SetTensorB(gm_b);    // Set the right matrix B.
mm.SetSparseIndex(gm_index); // Pass the index matrix generated during the densification process.
mm.SetBias(gm_bias);    // Set the bias.

Execute the matrix multiplication operation.

On the kernel, the matrix multiplication operation is completed based on the index matrix loaded in step 4. The Matmul API densifies matrix A. In other words, two elements in the corresponding locations are selected—in accordance with the index matrix—from every four elements in matrix A for computation.

         
              // Call the Iterate and GetTensorC or IterateAll APIs to complete matrix multiplication.
while (mm.Iterate()) {   
    mm.GetTensorC(gm_c); 
}
// mm.IterateAll(gm_c);
mm.End();

Parameters

**Table 1** Parameters for the SparseMatmulType type
Parameter	Description
POSITION	Logical memory location. The matrix B can only be set to TPosition::GM only.
INDEX_POSITION	Logical memory location of the index matrix. This parameter can be set to TPosition::GM only.
CubeFormat	Physical layout format of data. For details, see the data formats. For matrix B, this parameter can be set to CubeFormat::ND or CubeFormat::NZ.
TYPE	For matrix B, this parameter can be set to the int8_t data type only.
ISTRANS	Whether to enable the matrix transpose function. Currently, this parameter can be set to true only, indicating that the matrix transpose function is enabled.
LAYOUT	Data layout format. In the Sparse Matmul scenario, this parameter can be set to LAYOUT::NONE only. NONE (default): BatchMatmul is not used.
IBSHARE	Whether to enable IBShare (IntraBlock Share). IBShare can reuse the same matrix A or matrix B data on L1 Buffer. When IBShare is enabled for both matrix A and matrix B, matrix A and matrix B on L1 Buffer are reused at the same time. In the Sparse Matmul scenario, this parameter can be set to false only, indicating that IBShare is disabled.

Application Scenarios

The Matmul computation scenario where the left matrix A is a sparse matrix and the right matrix B is a matrix after 4:2 densification.

Restrictions

In this scenario, only CUBE_ONLY mode (including matrix computation only) in the MDL template is supported.
The index matrix passed by the SetSparseIndex API supports the int8 data type and NZ data layout only.
In the original sparse matrix B, there should be a maximum of two non-zero elements (that is, at least two zero elements) in every four elements. If there are three or more non-zero elements, only the first two non-zero elements are used.
The values of M, K, and N cannot be 0.

Example

For details about the complete example in the Sparse Matmul scenario, see operator sample in the Sparse Matmul scenario.

Parent topic: Feature Scenarios