--sparsity
Applicability
Product |
Supported |
|---|---|
x |
|
√ |
|
√ |
|
x |
|
x |
Description
For the
For the
Enables global sparsity.
In the AMCT output model after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. The working principles are as follows:
Two groups of 2-bit indexes are generated for every four contiguous elements in the weight. The value range of index is {0, 1, 2}. The first index indicates the position of the first non-zero element in the first three elements, and the second index indicates the position of the last non-zero element in the last three elements. The following table lists the filtered index values.
Scenario |
ele0 |
ele1 |
ele2 |
ele3 |
Index[0] |
Index[1] |
|---|---|---|---|---|---|---|
Two non-zero elements |
0 |
0 |
X |
Y |
2'b10 |
2'b10 |
0 |
X |
0 |
Y |
2'b01 |
2'b10 |
|
One non-zero element |
0 |
0 |
0 |
X |
2'b00 |
2'b10 |
0 |
0 |
X |
0 |
2'b10 |
2'b00 |
|
All zero |
0 |
0 |
0 |
0 |
2'b00 |
2'b00 |
Two 2-bit indexes and two int8 dense weight elements are generated for every four int8 elements in the sparse matrix, and four 2-bit indexes form one int8 element. Therefore, the size of the output index matrix is one fourth of the dense weight. Note that:
The index matrix records the indexes of the two elements selected from every four elements of the sparse weight. The indexes are read by hardware during inference and used as identifiers for element filtering.
The following figure shows the interaction process.

- AMCT provides a structured sparse API for converting dense models into structured sparse models through retraining, and outputs framework-based sparse models. The sparse model structure is the same as that of the dense model, but structured sparsity has been implemented in parameters, that is, two weight elements in the Cin dimension out of four contiguous ones are forced to zero.
- If sparsity is 1, FE traverses the weights of all operator types (conv2d, matmulV2, and fc) that support structured sparsity and checks whether the current parameter distribution uses 2:4 structured sparsity. If the requirements are met, conv2d, matmulV2, and fc are replaced with Conv2dCompress, MatmulV2Compress, and FCCompress. The Alg attribute is added to the prototypes of the three operators to determine whether to use weight compression or 2:4 structured sparsity.
- Insert CompressOp (reuse the CompressOp operator prototype for weight compression and add the Alg attribute to distinguish weight compression from structured sparsity) to complete weight rearrangement and index generation for structured sparsity.
See Also
None
Due to hardware restrictions, this option cannot be used in conjunction with --compress_weight_conf.
Arguments
Arguments:
- 1: indicates that 2:4 structured sparsity is enabled.
- 0: indicates that sparsity is disabled.
Default: 0
Suggestions and Benefits
None
Examples
--sparsity=1
Restrictions
When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training.