--fusion_switch_file

Description

Sets the directory (including the file name) of the fusion switch configuration file for graph fusion and UB fusion patterns. You can disable selected fusion patterns in the configuration file.

Graph fusion: refers to the process that FE modifies a graph according to given fusion patterns. The base operators in the graph are replaced by fused operators to improve the compute efficiency. Graph fusion improves the operator compute efficiency from the following aspects:
- Saves the compute time by reducing the mathematical compute workload of operators. For example, Conv and BiasAdd can be fused into one operator, so that accumulation is directly completed in the L0C Buffer to spare the Add compute workload.
- Accelerates post-fusion computation by utilizing hardware instructions. In the preceding example, graph fusion is performed to move the accumulation workload of "Conv+BiasAdd" composite to the L0C Buffer, thereby accelerating the compute process by utilizing the accumulation capability of L0C Buffer.
UB fusion: Unified Buffer (UB) is an important on-chip buffer in Ascend AI Processor. Assume that the compute result of operator A is stored in Unified Buffer and will be moved to Global Memory. To run operator B, the output of operator A needs to be moved from Global Memory back to Unified Buffer. After the compute process of operator B is complete, the output of operator A is moved from Unified Buffer back to Global Memory.
Throughout the process, the compute result of operator A is moved along the sequence of Unified Buffer->Global Memory->Unified Buffer->Global Memory. However, with UB fusion, you can fuse operators A and B to remove the unnecessary detour through Global Memory. UB fusion greatly improves the compute efficiency and decreases the bandwidth by reducing the data movements between Global Memory and Unified Buffer.

Argument

Argument: Directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions:

The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. Some fusion patterns are not switchable due to functionality restrictions. For the full list of switchable fusion patterns, see Graph Fusion and UB Fusion Patterns.

Suggestions and Benefits

None

Example

1. Disabling Selected Fusion Patterns
The following is a sample configuration file (fusion_switch.cfg). You can switch on or off selected fusion patterns as indicated by the field before each colon (:).
```
xxxFusionPass:off
yyyFusionPass:off
....
```
2. Disabling All Fusion Patterns
To disable all fusion patterns at once, refer to this configuration file example.
```
{
    "Switch":{
        "GraphFusion":{
            "ALL":"off"
        },
        "UBFusion":{
            "ALL":"off"
         }
    }
}
```
Remarks:
1. Some built-in fusion patterns are not switchable due to functionality restrictions and these fusion patterns will remain enabled despite user's switch settings.
2. To disable all fusion patterns except selected ones, refer to the following example.
```
{
    "Switch":{
        "GraphFusion":{
            "ALL":"off",
            "SoftmaxFusionPass":"on"
        },
        "UBFusion":{
            "ALL":"off",
            "TbePool2dQuantFusionPass":"on"
        }
    }
}
```

Upload the configured fusion_switch.cfg file to any directory (for example, $HOME/module) on the server where ATC is located.

--fusion_switch_file=$HOME/module/fusion_switch.cfg

After model conversion is complete, whether to generate the operator fusion result file fusion_result.json is determined based on the value of the --export_compile_stat parameter.

The file records the enabled fusion patterns (or those not disabled by the fusion_switch.cfg file), where the match_times field indicates the number of times that a fusion pattern is hit during model conversion and the effect_times field indicates the number of times that a fusion pattern takes effect. If --fusion_switch_file is not set, the generated fusion_result.json file records all fusion patterns that are hit during model conversion.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

If the value of the group attribute of the Convolution operator in the network model is equal to the value of the num_output attribute in the .prototxt file, VxxxRequantFusionPass in the preceding configuration file must be enabled.
Ascend Model Compression Toolkit (AMCT) will insert quant and dequant operators into the original model, while ATC will fuse the inserted operators during model conversion. In this case, to perform accuracy comparison between the AMCT-quantized model and the original one, --fusion_switch_file is required to switch fusion off for certain scenarios in the configuration file. The fusion patterns to be disabled are listed below:
For Atlas 200/300/500 Inference Products and Atlas Training Series Products, the fusion patterns that must be disabled are as follows:
```
V100RequantFusionPass:off
ConvConcatFusionPass:off
SplitConvConcatFusionPass:off
TbeEltwiseQuantFusionPass:off
TbeConvDequantVaddReluQuantFusionPass:off
TbeConvDequantVaddReluFusionPass:off
TbeConvDequantQuantFusionPass:off
TbeDepthwiseConvDequantFusionPass:off
TbeFullyconnectionElemwiseDequantFusionPass:off
TbeConv2DAddMulQuantPass:off
TbePool2dQuantFusionPass:off
TbeCommonRules0FusionPass:off
TbeCommonRules2FusionPass:off
```
The fusion patterns are described as follows:
- V100RequantFusionPass
  A graph fusion pattern. In V100 quantization, if the dequant and quant patterns are met, apply graph fusion to improve inference performance.
- ConvConcatFusionPass
  A graph fusion pattern, which supports Conv2D*N+concat operator fusion. The dequant and ReLU operators can be connected to Conv2D.
- SplitConvConcatFusionPass
  A graph fusion pattern, which supports split+Conv2D*N+concat operator fusion. The dequant and ReLU operators can be connected to Conv2D.
- TbeEltwiseQuantFusionPass
  A UB fusion pattern, which supports elemwise+quant operator fusion. The quant operator is optional.
- TbeConvDequantVaddReluQuantFusionPass
  A UB fusion pattern. For a quantized model, apply UB fusion on "Conv-dequant-vadd-relu-quant" composites to improve inference performance.
- TbeConvDequantVaddReluFusionPass
  A UB fusion pattern, which supports Conv2D+dequant+Vadd+ReLU or Conv2D+dequant+(LeakyReLU)+Vadd operator fusion.
- TbeConvDequantQuantFusionPass
  A UB fusion pattern. For a quantized model, apply UB fusion on "Conv-dequant-quant" composites to improve inference performance.
- TbeDepthwiseConvDequantFusionPass
  A UB fusion pattern, which supports DepthwiseConv2d+dequant+(ReLU/mul)+quant, DepthwiseConv2d+dequant+(sigmoid)+mul, DepthwiseConv2d+requant, or DepthwiseConv2d+(power+relu6+power)+elemwise+(quant) operator fusion.
- TbeFullyconnectionElemwiseDequantFusionPass
  A UB fusion pattern, which supports the following fusion:
  1. BatchMatMul/BatchMatMulV2 + elemwise fusion in the static shape scenario.
  2. MatMul/MatMulV2/BatchMatMul/BatchMatMulV2 + AscendDequant + elemwise1(+ elemwise2) fusion in the static shape scenario.
- TbeConv2DAddMulQuantPass
  A UB fusion pattern, which supports Conv+dequant+add+quant fusion. The add operator can be fused only if it has other two outputs other than quant.
- TbePool2dQuantFusionPass
  A UB fusion pattern. For a quantized model, apply UB fusion on "Pool2d-quant" composites to improve inference performance.
- TbeCommonRules0FusionPass
  A UB fusion pattern, which supports StridedRead+Conv2D+dequant+elemwise+quant+StridedWrite operator fusion. The nodes, except for Conv2D, are optional.
- TbeCommonRules2FusionPass
  A UB fusion pattern, which supports StridedRead+Conv2D+dequant+elemwise+quant+StridedWrite operator fusion. The nodes, except for Conv2D, are optional. The elemwise node supports the fusion in the multi-output scenario.

Parent topic: Model Tuning Options