Operator Code Implementation

Call TBE DSL APIs to implement the Add operator in the tbe/impl/add.py file, including operator function definition, operator argument verification, and compute process implementation, scheduling, and building.

Code Template Introduction

MindStudio generates the code to tbe/impl/add.py.
# Import the dependent Python modules.
import tbe.dsl as tbe
from tbe import tvm
from tbe.common.register import register_op_compute
from topi import generic


# Operator compute function
@register_op_compute(("Add")
def add_compute(x, y, z, kernel_name="add"):
    """
    To do: Implement the operator by referring to the
           TBE Operator Development Guide.
    """

    res = tbe.XXX(x, y)
    return res


# Operator definition function
def add(x, y, z, kernel_name="add"):
    """
    To do: Implement the operator by referring to the
           TBE Operator Development Guide.
    """
    # Input placeholder
    data_x = tvm.placeholder(x.get("shape"), dtype=x.get("dtype"), name="data_x")
    data_y = tvm.placeholder(y.get("shape"), dtype=y.get("dtype"), name="data_y")

    # Call the operator compute function.
    res = add_compute(data_x, data_y, z, kernel_name)

    # Auto schedule
    with tvm.target.cce():
        schedule = tbe.auto_schedule(res)

    # Build
    config = {"name": kernel_name,
              "tensor_list": [data_x, data_y, res]}
    tbe.build(schedule, config)
  • The Python modules for operator development are as follows:
    • tbe.dsl: imports the SDL APIs supported by TBE, including common ones such as vmuls, vadds, and matmul.

      For details about the API definition, see the Python functions in the Ascend-CANN-Toolkit installation directory/ascend-toolkit/latest/compiler/python/site-packages/tbe/dsl directory.

    • tbe.tvm: imports the code generation mechanism of TVM.

      For details about the API definition, see the Python functions in the Ascend-CANN-Toolkit installation directory/ascend-toolkit/latest/compiler/python/site-packages/tbe/tvm directory. For details about the usage, visit https://docs.tvm.ai/.

    • tbe.common.register.register_op_compute: implements automatic UB fusion for operators.

      For details about the API definition, see the definition of the register_op_compute function in the Ascend-CANN-Toolkit installation directory/ascend-toolkit/latest/compiler/python/site-packages/tbe/common/register/register_api.py file.

  • The template generates a compute function declaration named operatorName_compute.
    • If Sample Template or Tensorflow Template is selected during operator project creation, the input and output parameters and attributes are automatically generated based on the prototype definition.
    • If Empty Template is selected during operator project creation, an input and an output without attributes are generated by default.
  • The template generates with the declaration and part of the implementation of the definition function named operatorName. The sample code in the implementation function template contains the following functions:
    • Obtains the shape and dtype of the input tensors.
    • Verifies the arguments.
    • Creates placeholders for the input tensors.
    • Calls the compute function of the operator for computing, scheduling, and compilation.

Implementation of the Operator Definition Function

You need to implement the operator compute function based on the template code generated by MindStudio, and add the verification code of the operator inputs, outputs, or attributes to the operator definition function. The two inputs may have different shapes. This scenario is supported by the Add operator, but not supported by the operator compute API tbe.dsl.vadd( ). Therefore, the two input shapes need to be broadcast and verified, so that faults can be located during operator build. The modified code is as follows.

If you remotely start MindStudio on Windows, the following code may fail to be copied. For details about how to solve the problem, see What Do I Do If the Copied Content Cannot Be Pasted to the Editor Window When MindStudio Is Opened Remotely on Windows?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from __future__ import absolute_import
import tbe.dsl as tbe
from functools import reduce
from tbe import tvm
from tbe.common.register import register_op_compute
from tbe.common.utils import para_check
from tbe.common.utils import shape_util


SHAPE_SIZE_LIMIT = 2147483648

# Implement the compute logic of the Add operator.
@register_op_compute("Add", op_mode="dynamic", support_fusion=True)
def add_compute(input_x, input_y, output_z, kernel_name="add"):
    # Convert the shape to a list.
    shape_x = shape_util.shape_to_list(input_x.shape)    
    shape_y = shape_util.shape_to_list(input_y.shape) 

    # Assign the larger value of each dimension of shape_x and shape_y to shape_max.
    shape_x, shape_y, shape_max = shape_util.broadcast_shapes(shape_x, shape_y,
                                                              param_name_input1="input_x",
                                                              param_name_input2="input_y")
    shape_size = reduce(lambda x, y: x * y, shape_max[:])
    if shape_size > SHAPE_SIZE_LIMIT:
        raise RuntimeError("the shape is too large to calculate")

    # Broadcast the shape of input_x as shape_max.
    input_x = tbe.broadcast(input_x, shape_max)
    input_y = tbe.broadcast(input_y, shape_max)

    # Add input_x and input_y.
    res = tbe.vadd(input_x, input_y)

    return res

# Operator definition function
@para_check.check_op_params(para_check.REQUIRED_INPUT, para_check.REQUIRED_INPUT,
                            para_check.REQUIRED_OUTPUT, para_check.KERNEL_NAME)
def add(input_x, input_y, output_z, kernel_name="add"):
    # Obtain the shape and data type of the operator input tensor.
    shape_x = input_x.get("shape")
    shape_y = input_y.get("shape")

    # Verify the operator input type.
    check_tuple = ("float16", "float32", "int32")
    input_data_type = input_x.get("dtype").lower()
    para_check.check_dtype(input_data_type, check_tuple, param_name="input_x")

    # Assign the larger value of each dimension of shape_x and shape_y to shape_max.
    shape_x, shape_y, shape_max = shape_util.broadcast_shapes(shape_x, shape_y,
                                                              param_name_input1="input_x",
                                                              param_name_input2="input_y")

     # If the shape length is 1, assign a value directly. If the shape length is not 1, shape needs to be tiled and the last dimension needs to be removed. For the shape with the last dimension of 1 and the shape without the last dimension, if their formats are the same, for example, 2 x 3 = 2 x 3 x 1, the last dimension can be removed to improve the scheduling efficiency.
    if shape_x[-1] == 1 and shape_y[-1] == 1 and shape_max[-1] == 1:
        shape_x = shape_x if len(shape_x) == 1 else shape_x[:-1]
        shape_y = shape_y if len(shape_y) == 1 else shape_y[:-1]
        shape_max = shape_max if len(shape_max) == 1 else shape_max[:-1]

    # Call the placeholder API of TVM to create a placeholder for each input tensor, returning a tensor object, respectively.
    data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type)
    data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type)

    # Call the compute implementation function.
    res = add_compute(data_x, data_y, output_z, kernel_name)

    # Auto schedule
    with tvm.target.cce():
        schedule = tbe.auto_schedule(res)
    # Build configuration
    config = {"name": kernel_name,
              "tensor_list": (data_x, data_y, res)}
    tbe.build(schedule, config)
  1. The operator definition function declaration contains the operator input information, output information, and kernel name.
    def add(input_x, input_y, output_z, kernel_name="add"):
    • input_x, input_y: two input tensors of the Add operator. Each tensor must be defined in dictionary format, including information such as shape and data type. The number of input tensors must be consistent with that defined in the operator information library definition file (tbe/op_info_cfg/ai_core/add.ini).
    • output_z: output tensor. The tensor information must be defined in dictionary format, including the shape, data type, and more. This field is reserved.

      The number of output tensors must be consistent with that defined in the operator information library definition file (tbe/op_info_cfg/ai_core/add.ini).

    • kernel_name: unique name of the operator in the kernel, that is, the name of the generated binary file and operator description file. The value can contain a maximum of 200 characters, which must be a combination of letters, digits, and underscores (_), beginning with a letter or underscore (_).
  2. Verify the operator input and infer the output shape.

    The Add operator needs to verify shapes of the two input tensors. Only data types float16, float32, and int32 are supported. Since the two input tensors of the Add operator may have different shapes, shape_util.broadcast_shapes() needs to be implemented to generate and verify the broadcast shape.

  3. Create placeholders of the two input tensors.

    Call the placeholder API of TVM to create a placeholder for each input tensor, returning a tensor object respectively.

    tensor_list described in 5 is a list of tensor objects returned by calls to the tvm.placeholder API. Therefore, these objects cannot be replaced in subsequent computation.

  4. Call the add_compute function.
    res = add_compute(data_x, data_y, output_z, kernel_name)  

    data_x and data_y are the tensor objects generated in 3.

    For details about how to implement the compute function, see Implementation of the Compute Function.

  5. Implement operator scheduling and building.

    tensor_list:

    "tensor_list": (data_x, data_y, res)

    The arguments are the two input tensors and one output tensor.

Implementation of the Compute Function

You need to customize the compute function of an operator based on the compute logic. The compute function of the Add operator is implemented as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@register_op_compute("Add", op_mode="dynamic", support_fusion=True)
def add_compute(input_x, input_y, output_z, kernel_name="add"):
    # Convert the shape to a list.
    shape_x = shape_util.shape_to_list(input_x.shape)    
    shape_y = shape_util.shape_to_list(input_y.shape) 

    # Assign the larger value of each dimension of shape_x and shape_y to shape_max.
    shape_x, shape_y, shape_max = shape_util.broadcast_shapes(shape_x, shape_y,
                                                              param_name_input1="input_x",
                                                              param_name_input2="input_y")
    shape_size = reduce(lambda x, y: x * y, shape_max[:])
    if shape_size > SHAPE_SIZE_LIMIT:
        raise RuntimeError("the shape is too large to calculate")

    # Broadcast the shape of input_x as shape_max.
    input_x = tbe.broadcast(input_x, shape_max)
    input_y = tbe.broadcast(input_y, shape_max)

    # Add input_x and input_y.
    res = tbe.vadd(input_x, input_y)

    return res
  1. The add_compute function is declared as follows.
    @register_op_compute("Add", op_mode="dynamic", support_fusion=True)
    def add_compute(input_x, input_y, output_z, kernel_name="add")

    The decorator @register_op_compute("Add", op_mode="dynamic"; support_fusion=True) is required in the DSL operator development mode. It is used to support automatic UB fusion for operators during network running, so that the compute function of the current custom operator can be automatically fused with other operators in the UB according to UB fusion patterns.

    Note the following:

    • input_x, input_y: indicate the arguments passed to the compute function, that is, the placeholders for the input tensors declared in 3, including information such as the shape and data type.
    • output_z: indicates the dictionary returned by the call to the operator API function in 1.
    • kernel_name: indicates the operator name in the kernel.
  2. Implement the compute logic of the Add operator.

    The Add operator requires that shapes of the two tensors to be added be the same. Therefore, the tbe.broadcast API is called to broadcast the two input tensors to the same shape, and then the tbe.vadd API is called to add the input tensors and return the result tensor.

Operator Build Verification

  1. At the bottom of the Python file of the operator, add the main function to call the operator, and build the operator implementation file by using MindStudio for simple syntax verification of the single-operator code. A code example is as follows:
    1
    2
    3
    4
    # Call the operator.
    if __name__ == '__main__':
        input_output_dict = {"shape": (5, 6, 7),"format": "ND","ori_shape": (5, 6, 7),"ori_format": "ND", "dtype": "float16"}
        add(input_output_dict, input_output_dict, input_output_dict, kernel_name="add")
    
  2. Right-click "tbe/impl/add.py and choose Run'add' from the shortcut menu to build the operator.
    If no build error is reported and a kernel_meta folder containing the following files is generated in the tbe/impl directory, the operator code can be built and run properly.
    • Binary file of the operator (.o)
    • Operator description file (.json): defines operator attributes and resources required for running the operator.