Principles

Overview

The operator prototype definition file specifies the constraints on an operator that runs on Ascend AI Processor, mainly reflecting the mathematical meanings of the operator. An operator prototype definition file defines the operator inputs, outputs, and attributes, and can be used to verify arguments and infer the shape. The defined prototype is registered to GE's operator prototype library. To generate a model, GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by using the inference function of the operator prototype library and allocates static memory for the result tensor.

Figure 1 shows the role of the operator prototype library in the model generation workflow.

Figure 1 Registering the operator prototype with GE

Operator registration includes the registration of the operator, InferShape function, and Verify function. During registration, the operator type (OpType) is used as the key.

  1. GE receives a graph, which represents the topology of an original network model trained under a third-party framework. Then, GE is initialized.
  2. The operator prototype library management module loads the operator prototype library (.so) of the corresponding OS and architecture from the opp/built-in/op_proto/lib directory of the CANN operator library.
  3. The operator prototype library management module registers the operator with OperatorFactory based on the information in the .so file, including the registration of the basic operator information, InferShape function, and Verify function. The three parts are registered to three map files by using the operator type (OpType) as the key.
  4. In the graph preparation phase, GE sends the graph a request for calling InferShape and Verify functions. The InferShape function is used to deduce the output shape to allocate static memory, while the Verify function is used for basic verification of parameters.
  5. All nodes in the graph are traversed.
  6. Each node sends a request for calling the InferShape and Verify functions to OpDesc.
  7. OpDesc obtains InferShape and Verify functions from OperatorFactory based on OpType.
  8. Verify function is executed for verification on OpDesc. If the verification is successful, go to the next step. If the verification fails, return to the previous step.
  9. InferShape function is executed on OpDesc to infer the shape of the output tensor.
  10. The InferShape result on OpDesc is returned to GE. Then, GE allocates static memory for the output tensor based on the InferShape result.
  11. GE performs the residual operations.

About InferShape

Before getting started, you need to know the following concepts:

  • A graph is a basic structure for build that consists of nodes, that is, operators and edges. The edges indicate the data to be transferred, that is, tensor. An tensor has three main attributes: dtype, shape, and format.
  • An operator may have more than one input and more than one output. The numbers of inputs and outputs are represented by the numbers of input TensorDesc and output TensorDesc, respectively.
  • If two operators are connected through an edge in a graph, the output TensorDesc of the upstream operator should be the same as the input TensorDesc of the downstream operator. In addition, after passing the input TensorDesc to an operator, all output TensorDescs will be inferred at build time. However, for the operators whose output TensorDescs are inferred based on the actual arguments, if the input is not a constant node, the inferred output TensorDesc may be inaccurate.

As described above, after all the TensorDescs at the input layer in a graph are clarified, data flow can be established through edges, and all the input and output TensorDescs can be inferred.

Note the following restrictions:

  • The prerequisite for calling InferShape is that all the TensorDescs at the input layer in a graph are determinate. Otherwise, the InferShape call fails.
  • The InferShape function in GE is used to infer the dtype and shape of TensorDesc. After the inference is complete, the dtype and shape specifications of the entire graph are coherent. After the network model is generated, if the dtype and shape specifications of the dump graph (ge_proto_000000xx_after_infershape.txt) are not coherent globally, the InferShape call fails.

    To generate a dump graph of GE, set the following environment variable for network model generation:

    export DUMP_GE_GRAPH=1

  • In GE, the output tensor of the upstream operator is passed as the input to the downstream operator, and the downstream operator infers and updates the output tensor based on the passed input. If InferShape is not customized, the original output tensor is retained.
  • For the operators whose output TensorDescs are inferred based on the actual input values, the inputs should be const nodes. Otherwise, the inferred shape is wrong.

Implementation File Description

You need to implement the following files when defining operator prototypes:
  • Register the operator IR prototype in the header file.
  • Implement the verification function and shape derivation function in the .cc file.