Development Mode Selection

Overview

As described in CANN Operator Types, CANN operators can be developed in TBE DSL, TBE TIK, or AI CPU mode. When developing an operator, you need to first select an implementation mode for your use case.

This section describes how to select an operator development mode with examples.

Procedure

Figure 1 Procedure of selecting an operator development mode

Analyze the principle of the operator algorithm and extract the mathematical expression of the operator.
Analyze the instruction process and select a proper operator development mode.
1. Based on the mathematical expression and Operator Acceleration Library API Reference, check whether the CANN built-in operators can be used after assembly.
  - If yes, assemble CANN built-in operators for function implementation.
    - Using Ascend AI Processor to accelerate a computation (such as searching for the maximum value or converting data types) in the inference scenario without applying the operator to the original framework network: You can use an Ascend Graph API to achieve the function, load the built model to the AscendCL application, and run inference. For details about graph building, see Ascend Graph Developer Guide.
    - Applying an operator to the original framework network: You simply need to map the operator on CANN to the operator on the original network by referring to Framework-based Adaptation.
  - If no, go to 2.b.
2. Analyze whether TBE DSL compute APIs are helpful.
  TBE DSL compute APIs are highly encapsulated and user-friendly. You only need to focus on computation process expression and the subsequent scheduling, optimization, and building can be completed by using the APIs directly.
  
  However, in most cases, TBE DSL compute APIs are only suited for the combination of vector operations. Beyond this, complex matrix compute APIs cannot work with other compute APIs (for details, see API restrictions in TBE DSL API).
  
  If the compute logic cannot be expressed through the TBE DSL APIs, go to 2.c.
3. If the operator logic cannot be expressed using the first two methods, analyze whether TBE TIK APIs would be helpful.
  This mode applies to the development of all kinds of operators and supports complex computing (such as sorting) that cannot be described using lambda expressions. However, this mode is difficult. You need to use TBE TIK APIs to describe and schedule the computation, and manually control data transfer parameters and scheduling strategies.
  - If the TBE TIK APIs fail to implement operators because they do not support some data types, you can use the AI CPU mode.
  - If it is difficult to build a network quickly using the TBE TIK mode, you can use the AI CPU mode first and convert AI CPU operators into TBE TIK operators for future performance tuning.

Development Mode Comparison

**Table 1** Operator development modes
-	CANN Built-in Operator Assembly	TBE DSL	TBE TIK	AI CPU
Language	C++	Python	Python	C++
Preference	Most preferred	Preferred	Less preferred	Least preferred
Advantage	You simply need to understand the mathematical logic of operators and assembly. The operator implementation logic is invisible, and you do not need to customize operator deliverables such as operator implementation, prototype, and information library in CANN. CANN automatically fuses graphs after graph construction.	You do not need to pay attention to hardware logic, which is friendly to beginners.	This mode offers greater flexibility in data transfer parameters and scheduling and high operator performance if you are familiar with the architecture of Ascend AI Processor.	Native C++ APIs are used, which is friendly to developers with C++ programming capabilities. You do not need to pay attention to hardware logic.
Use case	Scenarios where target functions can be realized through assembling CANN built-in operators.	Vector operations with simple logic.	Development of various operators, including complex computing scenarios that cannot be described using lambda expressions, such as sorting.	Development of some operators that are not supported by the AI Core, or quick network building. You may need operators that are not suitable for running on the AI Core, including operators for non-matrix complex computation and selection-intensive operators with complex logic. For example, control operators such as Dump and Profiling, resource status operators such as Queue and Stack, and search operators such as TopK and Where. You may need operators that are not supported by the AI Core. For example, the operators need to support some data types that are not supported by the AI Core.
Disadvantage	Limited use cases.	Complex operator logic cannot be expressed. Matrix, convolution, and pooling APIs cannot be used together with other compute APIs currently. For details, see the API restrictions in TBE DSL API.	The barrier to this mode is high. You are required to understand the architecture of Ascend AI Processor and be familiar with TIK APIs.	The AI CPU has low performance. The AI CPU does not provide encapsulated APIs. Therefore, operator implementation is complex.

Examples

Example 1: To implement [M, K] * [K, N] and calculate TopN on the resultant [M, N].
These functions can be implemented by combining the built-in CANN operators: MatMul and TopK. Therefore, built-in operator assembly is preferred.
Example 2: To implement a Rsqrt operator that does not exist in the operator library. The Rsqrt operator calculates the reciprocal of the square root of x. The mathematical expression is as follows: y = 1/sqrt(x).
First analyze whether DSL APIs can be used to implement this operator. sqrt(x) can be implemented by using the tbe.dsl.vsqrt API, and division can be implemented by using the tbe.dsl.vdiv API. Therefore, this operator can be developed in TBE DSL mode.
Example 3: To implement a ScatterNdAdd operator that does not exist in the operator library. This operator has three inputs: var, indices, and updates. It uses updates to update the data specified by indices in var, that is, adding the value of updates to the data specified by var.
This operator calculates elements in different dimensions of a tensor at the same time. TBE DSL APIs are unable to implement this operator. As such, the TIK mode is recommended.

Parent topic: Development Preparations