--sparsity

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

x

Atlas training products

x

Description

Enables global sparsity.

In the model output by Ascend Model Compression Toolkit (AMCT) after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. The working principles are as follows:

Two groups of 2-bit indexes are generated for every four contiguous elements in the weight. The value range of index is {0, 1, 2}. The first index indicates the position of the first non-zero element in the first three elements, and the second index indicates the position of the last non-zero element in the last three elements. The following table lists the filtered index values.

Table 1 Filter rules

Scenario

ele0

ele1

ele2

ele3

index[0]

index[1]

Two non-zero elements

0

0

X

Y

2'b10

2'b10

0

X

0

Y

2'b01

2'b10

One non-zero element

0

0

0

X

2'b00

2'b10

0

0

X

0

2'b10

2'b01

All zero

0

0

0

0

2'b00

2'b00

Two 2-bit indexes and two int8 dense weight elements are generated for every four int8 elements in the sparse matrix, and four 2-bit indexes form one int8 element. Therefore, the size of the output index matrix is 1/4 of the dense weight. Note that:

The index matrix records the indexes of the two selected elements (from every four elements) of the sparse weight. The indexes are read by hardware during inference and are used as identifiers for element filtering.

The following figure shows the interaction process.

Figure 1 Interaction process
  1. AMCT provides a structured sparse API for converting dense models into structured sparse models through retraining, and outputs framework-based sparse models. The sparse model structure is the same as that of the dense model, but structured sparsity has been implemented in parameters, that is, two weight elements in the Cin dimension out of four contiguous ones are forced to zero.
  2. If sparsity is 1, GE traverses the weights of all operator types (Conv2D, MatMulV2, and FullyConnection) that support structured sparsity and checks whether the current parameter distribution uses 2:4 structured sparsity. Insert CompressOp (reuse the CompressOp operator prototype for weight compression and add the Alg attribute to distinguish weight compression from structured sparsity) to complete weight rearrangement and index generation for structured sparsity.
  3. If the execution conditions of 2:4 structured sparsity are met, replace the operators that support sparsity in the original framework, for example, replace Conv2D with Conv2DCompress. The Alg attribute is added to the operator prototype to determine whether the weight compression or 2:4 structured sparsity feature is used.

See Also

None

Argument

  • 1: Indicates that 2:4 structured sparsity is enabled.
  • 0 (default): Disables the sparsity.

Suggestions and Benefits

None

Example

--sparsity=1

Applicability

Atlas 200I/500 A2 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas A3 training products/Atlas A3 inference products

Restrictions

When using this option, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training.