--sparsity

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	x
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference products	x
Atlas training products	x

Description

For the Atlas inference products, this option is not supported.

For the Atlas training products, this option is not supported.

Enables global sparsity.

In the AMCT output model after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. The working principles are as follows:

Two groups of 2-bit indexes are generated for every four contiguous elements in the weight. The value range of index is {0, 1, 2}. The first index indicates the position of the first non-zero element in the first three elements, and the second index indicates the position of the last non-zero element in the last three elements. The following table lists the filtered index values.

**Table 1** Filter rules
Scenario	ele1	ele2	ele3	Index[0]	Index[1]
Two non-zero elements	0	X	Y	2'b10	2'b10
Two non-zero elements	X	0	Y	2'b01	2'b10
One non-zero element	0	0	X	2'b00	2'b10
One non-zero element	0	X	0	2'b10	2'b00
All zero	0	0	0	2'b00	2'b00

Two 2-bit indexes and two int8 dense weight elements are generated for every four int8 elements in the sparse matrix, and four 2-bit indexes form one int8 element. Therefore, the size of the output index matrix is one fourth of the dense weight. Note that:

The index matrix records the indexes of the two elements selected from every four elements of the sparse weight. The indexes are read by hardware during inference and used as identifiers for element filtering.

The following figure shows the interaction process.

Figure 1 Interaction process

AMCT provides a structured sparse API for converting dense models into structured sparse models through retraining, and outputs framework-based sparse models. The sparse model structure is the same as that of the dense model, but structured sparsity has been implemented in parameters, that is, two weight elements in the Cin dimension out of four contiguous ones are forced to zero.
If sparsity is 1, FE traverses the weights of all operator types (conv2d, matmulV2, and fc) that support structured sparsity and checks whether the current parameter distribution uses 2:4 structured sparsity. If the requirements are met, conv2d, matmulV2, and fc are replaced with Conv2dCompress, MatmulV2Compress, and FCCompress. The Alg attribute is added to the prototypes of the three operators to determine whether to use weight compression or 2:4 structured sparsity.
Insert CompressOp (reuse the CompressOp operator prototype for weight compression and add the Alg attribute to distinguish weight compression from structured sparsity) to complete weight rearrangement and index generation for structured sparsity.

Arguments

Arguments:

1: indicates that 2:4 structured sparsity is enabled.
0: indicates that sparsity is disabled.

Default: 0

Suggestions and Benefits

None

Examples

--sparsity=1

Restrictions

When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training.

Parent topic: Advanced Functionality