Introduction

Overview

Operator fusion, an important means to improve network performance, can be implemented by graph fusion or Unified Buffer fusion (UB fusion).

The system has a range of built-in graph fusion and UB fusion patterns, which are enabled by default and can be disabled. If any pattern is disabled by default, special description will be provided. This document describes only part of the fusion patterns.

Graph Fusion

Operator performance can be improved with hardware-irrelevant fusion strategies when a network model is built using graphs.

Graph fusion refers to the process that FE modifies a graph according to the fusion patterns. The base operators in the graph are replaced by fused operators to improve the compute efficiency. Graph fusion improves the operator compute efficiency from the following aspects:

Saves the compute time by reducing the mathematical compute workload of operators. For example, Conv and BiasAdd can be fused into one operator, so that accumulation is directly completed in the L0C Buffer to spare the Add compute workload.
Accelerates post-fusion computation by utilizing hardware instructions. In the preceding example, graph fusion is performed to move the accumulation workload of "Conv+BiasAdd" structure to the L0C Buffer, thereby accelerating the compute process by utilizing the accumulation capability of L0C Buffer.

Graph fusion includes fusion of individual graphs and fusion of partitioned subgraphs.

Graph fusion: Mathematical fusion is performed on operators in a graph to fuse operators into one or more operators, which is hardware irrelevant.
As shown in Figure 1, the Conv2D and Batchnorm operators are fused into Conv2D after mathematical calculation.

Figure 1 Graph fusion example

Fusion by partition: An operator is split to two or more operators.
As shown in Figure 2, operator X is split into two operators: X1 and X2.

Figure 2 Fusion by partition

UB Fusion

Operator performance can be improved with hardware-relevant fusion strategies when a network model is built using graphs, for example, performing UB fusion at graph build time.

UB is the Unified Buffer on Ascend AI Processor. UB fusion is the process of fusing operators in a graph based on the hardware UB. Assume that two operators are running independently and the compute result of operator 1 is stored in UB and needs to be moved to the DDR. To run operator 2, the output of operator 1 needs to be moved from the DDR back to UB. After the compute process of operator 2 is complete, the output of operator 1 is moved from UB back to the DDR.

The result of operator 1 is moved in the following sequence: UB -> DDR -> UB -> DDR. The process of data movement passing by DDR is unnecessary. Therefore, operators 1 and 2 can be fused into one operator. After fusion, the output of operator 1 is retained in the UB so that operator 2 can directly obtain data from UB for computation. In this way, the performance is improved by saving one DDR read transaction and one DDR write transaction.

Figure 3 UB fusion

Enabling/Disabling Fusion Patterns

You can choose to enable/disable some of the fusion patterns in advance before building a model as needed to improve the build performance. However, enabling/disabling fusion patterns does not mean better computing performance. The method is as follows:

When converting a model with ATC, use --fusion_switch_file to configure the directory and name of the fusion switch file. For example:
```
--fusion_switch_file=/home/fusion_switch.cfg
```

When building an IR model, use FUSION_SWITCH_FILE to configure the directory and name of the fusion switch file. For example:

std::map<AscendString, AscendString> global_options = {
        {ge::ir_option::FUSION_SWITCH_FILE, "/home/fusion_switch.cfg"},
    };
auto status = aclgrphBuildInitialize(global_options);

During model training and online inference, use fusion_switch_file to configure the directory and name of the fusion switch file. For example:
```
custom_op.parameter_map["fusion_switch_file"].s = tf.compat.as_bytes("/home/fusion_switch.cfg")
```

The fusion_switch.cfg file passed in the example is for reference only, which needs to be created and named by yourself. The following shows the content example, in which on indicates that a fusion pattern is enabled, and off otherwise.

{
    "Switch":{
        "GraphFusion":{
            "ConvToFullyConnectionFusionPass":"on",
            "SoftmaxFusionPass":"on",
            "ConvConcatFusionPass":"on",
            "MatMulBiasAddFusionPass":"on",
            "PoolingFusionPass":"on",
            "ZConcatv2dFusionPass":"on",
            "ZConcatExt2FusionPass":"on",
            "TfMergeSubFusionPass":"on"
        },
        "UBFusion":{
            "FusionVirtualOpSetSwitch":"on"
        }
    }
}

To disable all fusion patterns at a time, refer to this configuration file example.

{
    "Switch":{
        "GraphFusion":{
            "ALL":"off"
        },
        "UBFusion":{
            "ALL":"off"
         }
    }
}

Notes:

Even the preceding example disables only part of built-in fusion patterns, because disabling certain patterns may lead to functionality problems.

You can disable all fusion patterns except selected ones.

{
    "Switch":{
        "GraphFusion":{
            "ALL":"off",
            "SoftmaxFusionPass":"on"
        },
        "UBFusion":{
            "ALL":"off",
            "FusionVirtualOpSetSwitch":"on"
        }
    }
}

You can enable all fusion patterns except selected ones.

{
    "Switch":{
        "GraphFusion":{
            "ALL":"on",
            "SoftmaxFusionPass":"off"
        },
        "UBFusion":{
            "ALL":"on",
            "FusionVirtualOpSetSwitch":"off"
        }
    }
}