Workflow for Operator Building and Running

Logical Architecture

A complete CANN operator consists of four parts: operator prototype definition, framework-specific operator plugin, operator information library definition, and operator implementation file.

Figure 1 and Figure 2 show the logical architecture of building and running a developed operator on Ascend AI Processor hardware platform.

Figure 1 Logical architecture of operator building
Figure 2 Logical architecture of operator execution

The Framework Adapter is used only when training is performed based on the original framework (such as TensorFlow and PyTorch) to migrate a model to the CANN platform.

The contents in are the deliverables that need to be implemented during the development of CANN operators.

Deliverable

Description

Operator prototype library

The operator prototype definition file specifies the constraints on an operator that runs on Ascend AI Processor, mainly reflecting the mathematical meanings of the operator. It defines operator inputs, outputs, attributes, and value ranges, and can be used to verify arguments and infer the shape. During network execution, GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by using the inference function of the operator prototype library and allocates static memory for the result tensor.

Note: The operator prototype definition does not distinguish the operator type (TBE or AI CPU), which is a global restriction on the operators that can run on Ascend AI Processor.

Operator implementation

Describes the computation function of an operator.
  • For TBE operators, a Python file is used to describe the computation and Schedule implementation.
  • For AI CPU operators, a C++ file is used to describe the operator class definition and computation implementation.

Operator information library

Includes the TBE and AI CPU operator information libraries. It describes the implementation specifications of an operator implementation file, which is the implementation restrictions of the operator on Ascend AI Processor, including the input and output data types, format, and input shape of the operator. During network execution, Graph Compiler matches the operator information library to the operator implementation file.

Operator plugin

In custom operator development based on a third-party framework (such as TensorFlow and Caffe), in addition to developing the operator implementation code, you also need to develop a plugin capable of mapping a third-party operator to one supported by Ascend AI Processor. To run a network trained in a third-party framework, the operator plugin information in GE is loaded and called first to parse and map the operator on the network to an operator supported by Ascend AI Processor.

Note: If you do not need to fuse any operator to the original network, the operator plugin is unnecessary.

When both a TBE operator and an AI CPU operator of the same OpType are available, GE preferentially runs the TBE operator.

Building an Operator

  • TBE operator building flowchart

    Figure 3 shows the building process of a TBE operator.

    Figure 3 TBE operator building flowchart
    1. Deliver the open-source framework network model to GE.

      If online training is performed on the TensorFlow or PyTorch framework, the TF Adapter or PT Adapter API is called to generate an original network model and then the model is delivered to GE. If the original network framework is MindSpore, the original network model is directly delivered to GE. If an AscendCL application is used for model inference, the original network model is directly delivered to GE.

      The topology of a network model is referred to as a graph.

    2. GE calls the operator plugin to map operators in the original network model to operators supported by Ascend AI Processor, so that the original graph can be parsed into a graph supported by Ascend AI Processor.

      If the original network framework is MindSpore, operators have been parsed and mapped on MindSpore. Therefore, you do not need to call the plugin again in GE for parsing.

    3. GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by using the inference function of the operator prototype library and allocates static memory for the result tensor.
    4. GE sends a graph optimization request and the graph to FE. Then, FE fuses the operators according to the fusion patterns and selects the operators with the highest priority. By default, custom operators have the highest priority. Finally, an optimized graph is returned to GE.
    5. GE partitions the graph into subgraphs and sends the subgraphs to FE. Then, FE inserts Transformation Operators into the subgraphs, prebuilds the TBE operator based on the data flow of subgraphs, performs UB fusion on these operators on subgraphs based on the fusion patterns, finds the operator implementation file based on the operator information library, builds the operator implementation file into operator kernel files (.o and .json files), and returns the optimized subgraphs to GE.
    6. GE merges the subgraphs into a graph and further optimizes the graph.
    7. GE builds the graph, allocates memory and stream resources, and sends a task request to FE. Then, FE returns the task info to GE. After the graph building is complete, a model that adapts to Ascend AI Processor is generated.
  • AI CPU operator building flowchart

    Figure 4 shows the building process of an AI CPU operator.

    Figure 4 AI CPU operator building flowchart
    1. Deliver the open-source framework network model to GE.

      If online training is performed on the TensorFlow or PyTorch framework, the TF Adapter or PT Adapter API is called to generate an original network model and then the model is delivered to GE. If the original network framework is MindSpore, the original network model is directly delivered to GE. If an AscendCL application is used for model inference, the original network model is directly delivered to GE.

      The topology of a network model is referred to as a graph.

    2. GE calls the operator plugin to map operators in the original network model to operators supported by Ascend AI Processor, so that the original graph can be parsed into a graph supported by Ascend AI Processor.

      If the original network framework is MindSpore, operators have been parsed and mapped on MindSpore. Therefore, you do not need to call the plugin again in GE for parsing.

    3. GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by using the inference function of the operator prototype library and allocates static memory for the result tensor.
    4. GE delivers the entire graph to AI CPU Engine. AI CPU Engine reads the operator information library, looks up an appropriate format for the operator, and returns the format to GE.
    5. GE partitions the graph into subgraphs and delivers the subgraphs to AI CPU Engine. AI CPU Engine optimizes the subgraphs and returns the optimized subgraphs to GE.
    6. GE merges the subgraphs into a graph and further optimizes the graph.
    7. GE builds the graph (including memory and stream allocation) and sends a genTask request to AI CPU Engine. Then, AI CPU Engine returns the task info of the operator to GE. After the graph is built, a model that adapts to Ascend AI Processor is generated.

Running an Operator

  • TBE operator execution flowchart

    Figure 5 shows the execution process of a TBE operator.

    Figure 5 TBE operator execution flowchart
    1. GE loads the built model and delivers an operator execution request.
    2. Runtime delivers the task request to Task Schedule.
    3. Task Schedule schedules the task and calls the operator compute APIs.
    4. The operator is executed on the AI Core.
  • AI CPU operator execution flowchart

    Figure 6 shows the execution process of an AI CPU operator.

    Figure 6 AI CPU operator execution flowchart
    1. GE delivers an operator execution request.
    2. Runtime delivers the corresponding task to AI CPU Schedule.
    3. AI CPU Schedule schedules the task and calls the operator compute API.
    4. AI CPU operator library parses and instantiates the operator implementation, and executes the operator through the Compute function call.