--sparsity
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
x |
|
x |
Description
Enables global sparsity.
In the model output by Ascend Model Compression Toolkit (AMCT) after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. The working principles are as follows:
Two groups of 2-bit indexes are generated for every four contiguous elements in the weight. The value range of index is {0, 1, 2}. The first index indicates the position of the first non-zero element in the first three elements, and the second index indicates the position of the last non-zero element in the last three elements. The following table lists the filtered index values.
Scenario |
ele0 |
ele1 |
ele2 |
ele3 |
index[0] |
index[1] |
|---|---|---|---|---|---|---|
Two non-zero elements |
0 |
0 |
X |
Y |
2'b10 |
2'b10 |
0 |
X |
0 |
Y |
2'b01 |
2'b10 |
|
One non-zero element |
0 |
0 |
0 |
X |
2'b00 |
2'b10 |
0 |
0 |
X |
0 |
2'b10 |
2'b01 |
|
All zero |
0 |
0 |
0 |
0 |
2'b00 |
2'b00 |
Two 2-bit indexes and two int8 dense weight elements are generated for every four int8 elements in the sparse matrix, and four 2-bit indexes form one int8 element. Therefore, the size of the output index matrix is 1/4 of the dense weight. Note that:
The index matrix records the indexes of the two selected elements (from every four elements) of the sparse weight. The indexes are read by hardware during inference and are used as identifiers for element filtering.
The following figure shows the interaction process.

- AMCT provides a structured sparse API for converting dense models into structured sparse models through retraining, and outputs framework-based sparse models. The sparse model structure is the same as that of the dense model, but structured sparsity has been implemented in parameters, that is, two weight elements in the Cin dimension out of four contiguous ones are forced to zero.
- If sparsity is 1, GE traverses the weights of all operator types (Conv2D, MatMulV2, and FullyConnection) that support structured sparsity and checks whether the current parameter distribution uses 2:4 structured sparsity. Insert CompressOp (reuse the CompressOp operator prototype for weight compression and add the Alg attribute to distinguish weight compression from structured sparsity) to complete weight rearrangement and index generation for structured sparsity.
- If the execution conditions of 2:4 structured sparsity are met, replace the operators that support sparsity in the original framework, for example, replace Conv2D with Conv2DCompress. The Alg attribute is added to the operator prototype to determine whether the weight compression or 2:4 structured sparsity feature is used.
See Also
None
Argument
- 1: Indicates that 2:4 structured sparsity is enabled.
- 0 (default): Disables the sparsity.
Suggestions and Benefits
None
Example
--sparsity=1
Applicability
Restrictions
When using this option, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training.