--fusion_switch_file

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference products	√
Atlas training products	√

Description

Sets the directory (including the file name) of the fusion switch configuration file for graph fusion and UB fusion patterns. You can disable selected fusion patterns in the configuration file.

Graph fusion: refers to the process that FE modifies a graph according to given fusion patterns. The base operators in the graph are replaced by fused operators to improve the compute efficiency. Graph fusion improves the operator compute efficiency from the following aspects:
- Saves the compute time by reducing the mathematical compute workload of operators. For example, Conv and BiasAdd can be fused into one operator, so that accumulation is directly completed in the L0C Buffer to spare the Add compute workload.
- Accelerates post-fusion computation by utilizing hardware instructions. In the preceding example, graph fusion is performed to move the accumulation workload of "Conv+BiasAdd" composite to the L0C Buffer, thereby accelerating the compute process by utilizing the accumulation capability of L0C Buffer.
UB fusion: Unified Buffer (UB) is an important on-chip buffer in the Ascend AI Processor. UB fusion indicates that the compute result of operator A is stored in Unified Buffer and needs to be moved to Global Memory. To run operator B, the output of operator A needs to be moved from Global Memory back to Unified Buffer. After the compute process of operator B is complete, the output of operator A is moved from Unified Buffer back to Global Memory.
Throughout the process, the compute result of operator A is moved along the sequence of Unified Buffer->Global Memory->Unified Buffer->Global Memory. However, with UB fusion, you can fuse operators A and B to remove the unnecessary detour through Global Memory. UB fusion greatly improves the compute efficiency and decreases the bandwidth by reducing the data movements between Global Memory and Unified Buffer.

Argument

Argument: Directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions:

The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. Some fusion patterns are not switchable due to functionality restrictions. For the full list of switchable fusion patterns, see Graph Fusion and UB Fusion Patterns.

Suggestions and Benefits

None

Example

1. Disabling selected fusion patterns
The following is a sample configuration file (fusion_switch.cfg). You can switch on or off selected fusion patterns as indicated by the field before each colon (:).
```
xxxFusionPass:off
yyyFusionPass:off
....
```
2. Disabling all fusion patterns
This option allows you to disable all fusion patterns in one-click mode.
- Configuration file example:
```
{
    "Switch":{
        "GraphFusion":{
            "ALL":"off"
        },
        "UBFusion":{
            "ALL":"off"
         }
    }
}
```
Remarks:
1. Some built-in fusion patterns are not switchable due to functionality restrictions and these fusion patterns will remain enabled despite user's switch settings.
2. To disable all fusion patterns except selected ones, refer to the following example.
  1. Configuration file example:
```
{
    "Switch":{
        "GraphFusion":{
            "ALL":"off",
            "SoftmaxFusionPass":"on"
        },
        "UBFusion":{
            "ALL":"off",
            "TbePool2dQuantFusionPass":"on"
        }
    }
}
```

Upload the configured fusion_switch.cfg file to any directory (for example, $HOME/module) on the server where ATC is located.

--fusion_switch_file=$HOME/module/fusion_switch.cfg

After the model is converted, fusion_result.json, which is the result file of operator fusion information, is generated based on the value of --export_compile_stat.

The file records the enabled fusion patterns (those not disabled in the fusion_switch.cfg file) during graph build, where the match_times field indicates the number of times that a fusion pattern is hit during model conversion and the effect_times field indicates the number of times that a fusion pattern takes effect. If --fusion_switch_file is not set, the generated fusion_result.json file records all fusion patterns that are hit during model conversion.

Restrictions

If the value of the group attribute of the Convolution operator in the network model is equal to the value of the num_output attribute in the .prototxt file, VxxxRequantFusionPass in the preceding configuration file must be enabled.
Ascend Model Compression Toolkit (AMCT) will insert quant and dequant operators into the original model, while ATC will fuse the inserted operators during model conversion. In this case, to perform accuracy comparison between the quantized model and the original one, --fusion_switch_file is required to switch fusion off for certain scenarios in the configuration file. The fusion patterns to be disabled are listed below:
For Atlas training products, the fusion patterns that must be disabled are as follows:
```
V100RequantFusionPass:off
ConvConcatFusionPass:off
SplitConvConcatFusionPass:off
TbeEltwiseQuantFusionPass:off
TbeConvDequantVaddReluQuantFusionPass:off
TbeConvDequantVaddReluFusionPass:off
TbeConvDequantQuantFusionPass:off
TbeDepthwiseConvDequantFusionPass:off
TbeFullyconnectionElemwiseDequantFusionPass:off
TbeConv2DAddMulQuantPass:off
TbePool2dQuantFusionPass:off
TbeCommonRules0FusionPass:off
TbeCommonRules2FusionPass:off
```
For Atlas inference products, the fusion patterns that must be disabled are as follows:
```
V200RequantFusionPass:off
ConvConcatFusionPass:off
SplitConvConcatFusionPass:off
TbeEltwiseQuantFusionPass:off
TbeConvDequantVaddReluQuantFusionPass:off
TbeConvDequantVaddReluFusionPass:off
TbeConvDequantQuantFusionPass:off
TbeDepthwiseConvDequantFusionPass:off
TbeFullyconnectionElemwiseDequantFusionPass:off
TbeConv2DAddMulQuantPass:off
TbePool2dQuantFusionPass:off
TbeCommonRules0FusionPass:off
TbeCommonRules2FusionPass:off
```
For Atlas 200I/500 A2 inference productss andAtlas A2 training products/Atlas A2 inference productss, the fusion patterns that must be disabled are as follows:
```
ConvConcatFusionPass:off
SplitConvConcatFusionPass:off
TbeEltwiseQuantFusionPass:off
TbeConvDequantVaddReluQuantFusionPass:off
TbeConvDequantVaddReluFusionPass:off
TbeConvDequantQuantFusionPass:off
TbeDepthwiseConvDequantFusionPass:off
TbeFullyconnectionElemwiseDequantFusionPass:off
TbeConv2DAddMulQuantPass:off
TbePool2dQuantFusionPass:off
TbeCommonRules0FusionPass:off
TbeCommonRules2FusionPass:off
```
For Atlas A3 training products/Atlas A3 inference products, the fusion patterns that must be disabled are as follows:
```
ConvConcatFusionPass:off
SplitConvConcatFusionPass:off
TbeEltwiseQuantFusionPass:off
TbeConvDequantVaddReluQuantFusionPass:off
TbeConvDequantVaddReluFusionPass:off
TbeConvDequantQuantFusionPass:off
TbeDepthwiseConvDequantFusionPass:off
TbeFullyconnectionElemwiseDequantFusionPass:off
TbeConv2DAddMulQuantPass:off
TbePool2dQuantFusionPass:off
TbeCommonRules0FusionPass:off
TbeCommonRules2FusionPass:off
```
The following outlines the fusion patterns. For details, see Graph Fusion and UB Fusion Patterns.
- V100RequantFusionPass
  A graph fusion pattern, which inserts the RequantHostCpuOpV2 operator into the input of AscendDequant.
- V200RequantFusionPass
  A graph fusion pattern, which merges AscendDequant and AscendQuant into AscendRequant and inserts the RequantHostCpuOpV2Re operator into the input of AscendDequant.
- ConvConcatFusionPass
  A graph fusion pattern, which supports Conv2D*N+concat operator fusion. The dequant and ReLU operators can be connected to Conv2D.
- SplitConvConcatFusionPass
  A graph fusion pattern, which supports split+Conv2D*N+concat operator fusion. The dequant and ReLU operators can be connected to Conv2D.
- TbeEltwiseQuantFusionPass
  A UB fusion pattern, which supports elemwise+quant operator fusion. The quant operator is optional.
- TbeConvDequantVaddReluQuantFusionPass
  A UB fusion pattern, which applies UB fusion on consecutive Conv-dequant-vadd-relu-quant nodes to improve inference performance for a quantized model.
- TbeConvDequantVaddReluFusionPass
  A UB fusion pattern, which supports Conv2D+dequant+Vadd+ReLU or Conv2D+dequant+(LeakyRelu)+Vadd operator fusion.
- TbeConvDequantQuantFusionPass
  A UB fusion pattern, which applies UB fusion on consecutive Conv-dequant-quant nodes to improve inference performance for a quantized model.
- TbeDepthwiseConvDequantFusionPass
  A UB fusion pattern, which supports DepthwiseConv2d+dequant+(ReLU/mul)+quant, DepthwiseConv2d+dequant+(sigmoid)+mul, DepthwiseConv2d+requant, or DepthwiseConv2d+(power+relu6+power)+elemwise+(quant) operator fusion.
- TbeFullyconnectionElemwiseDequantFusionPass
  A UB fusion pattern, which supports the following forms of fusion:
  1. BatchMatMul/BatchMatMulV2 + elemwise fusion in the static shape scenario.
  2. MatMul/MatMulV2/BatchMatMul/BatchMatMulV2 + AscendDequant + elemwise1(+ elemwise2) fusion in the static shape scenario.
- TbeConv2DAddMulQuantPass
  A UB fusion pattern, which supports Conv+dequant+add+quant fusion. The add operator can be fused only if it has other two outputs other than quant.
- TbePool2dQuantFusionPass
  A UB fusion pattern, which applies UB fusion on consecutive Pool2d-quant nodes to improve inference performance for a quantized model.
- TbeCommonRules0FusionPass
  A UB fusion pattern, which supports StridedRead+Conv2D+dequant+elemwise+quant+StridedWrite operator fusion. The nodes, except for Conv2D, are optional.
- TbeCommonRules2FusionPass
  A UB fusion pattern, which supports StridedRead+Conv2D+dequant+elemwise+quant+StridedWrite operator fusion. The nodes, except for Conv2D, are optional. The elemwise node supports the fusion in the multi-output scenario.

Parent topic: Model Tuning Options