Operator Delivery in Fallback Mode

Overview

The GE provides two common model scheduling modes (offload scheduling and host scheduling) to implement efficient collaboration between the host and device.

Offload scheduling applies to static shape models. Because the shape of the input tensor remains constant, memory orchestration and tiling calculation can be completed in the build phase. Therefore, operators in the model can be delivered to the device as an entire graph. During execution, you only need to deliver a model execution task on the host to trigger model scheduling on the device, improving model scheduling performance.

Host scheduling applies to dynamic shape models. Because the shape of the input tensor is not constant, information such as the input shape of the next operator can be determined only after the shape of the previous operator is deduced. Therefore, the model execution cannot be offloaded to the device. Generally, each operator kernel needs to be delivered to the device one at a time for execution.

Fallback-based operator delivery is a type of host scheduling. You can use the fallback function on the host to execute operators. For example, you can directly call the single-operator API aclnnXX to execute operators. You only need to call the EnableFallBack API of Ascend C during operator development. Then, the system automatically generates the fallback function. The GE automatically calls back this function, which converts the input, output, and attributes of the GE according to the parameter format required by the single-operator API aclnnXX, and then calls the API to execute the operator.

Figure 1 Comparison between common host scheduling and fallback delivery

Application Scenario

Offline inference is not supported. It applies to training or online inference scenarios.
It is recommended that fallback delivery be enabled for multi-kernel operators in dynamic shape models. This is because:
- In the static shape scenario, the aclnnXX API contains host operations and cannot deliver operators as an entire graph but as a subgraph after graph break. This may greatly affect the performance.
- In rare cases, developers develop multi-kernel operators (such as Matmul and MC² built in Ascend) for optimal performance optimization. These operators correspond to multiple kernels, which need to be launched during execution. The GE cannot process operators in a unified manner based on the preceding method, because the number of kernels is uncertain. Therefore, operators need to be delivered in fallback mode.
- For single-kernel operators, fallback-based delivery may cause performance deterioration. Currently, the aclnnXX API automatically generated based on the Ascend C custom operator project is a single-kernel operator API. Therefore, you are advised not to enable fallback-based delivery for these operators.

Applicability

Product	Supported or Not
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	×
Atlas inference products	√
Atlas training products	√

Principles

Figure 2 shows the process of fallback-based delivery of multi-kernel operators in the dynamic shape model.

Figure 2 Delivery and execution process of multi-kernel operators in the dynamic shape model

The fallback function converts the input, output, and attributes of the GE into the parameter format required by the single-operator API aclnn, and then calls the aclnnXX API to execute the operator. Take the concat operator as an example. The format of the fallback function is as follows:

      
           static graphStatus ConcatExecuteFunc(OpExecuteContext* host_api_ctx)

The OpExecuteContext pointer input contains the information required for fallback calculation, such as the input and output shape and datatype. For details, see OpImplSpaceRegistryV2 Class.

You do not need to manually implement the fallback function. During operator prototype registration, you only need to call the EnableFallBack API. The system automatically generates the fallback function and registers it with the GE.

How to Use

In the operator development phase, call the EnableFallBack API of Ascend C during operator prototype registration to automatically generate the fallback function.

        
         
           
           
             class CustomOp : public OpDef {
public:
    CustomOp(const char* name) : OpDef(name)
    {
        // Define the input and output information of the operator, including whether the input and output are required, and the data types and formats supported by the input and output.
        this->Input("x").ParamType(REQUIRED).DataType({ge::DT_FLOAT}).Format({ge::FORMAT_ND});
        this->Input("y").ParamType(REQUIRED).DataType({ge::DT_FLOAT}).Format({ge::FORMAT_ND});
        this->Output("z").ParamType(REQUIRED).DataType({ge::DT_FLOAT}).Format({ge::FORMAT_ND});
        // Register the shape inference function.
        this->SetInferShape(ge::InferShapeFunc);
        this->SetInferDataType(ge::InferDataTypeFunc);
        // Register the AI processor models supported by the operator by calling AddConfig.
        this->AICore().AddConfig("ascendxxx");
        this->EnableFallBack();
    }
};
OP_ADD(CustomOp);

            

          

        
       

The fallback function supports two calling modes: support_aclnn and aclnn_only, which can be configured by the aclnnSupport.value parameter of the ExtendCfgInfo API. For details, see the related API description.

support_aclnn: In the static shape scenario, the operator is executed through model offload; in the dynamic shape scenario, the fallback function is called on the host to deliver the operator. If EnableFallBack is called, this mode is used by default.
aclnn_only: Fallback-based operator delivery is used in both the dynamic and static shape scenarios. This mode is not recommended and will not be used in later versions.

During training or online inference in the GE graph mode, the fallback function is automatically called to execute related operators.

Debugging

The key logs in support_aclnn and aclnn_only modes are as follows:

      
           xxx, setting fallback attribute

Parent topic: Custom Operator Integration into a Graph