TIK Function

After finishing the preceding sections, readers will master the skills of data movement and Vector computation for TIK operators, understand the essence of TIK as a code generator, and be able to write some basic TIK operators.

This section mainly introduces the concepts and usage of the TIK functions.

A TIK operator consists of TIK functions and other common Python functions, and must contain a TIK entry point function. A TIK function is encapsulated using a Python function, formatted as function(parameter list). The parameter list contains a group of parameter definitions separated by commas (,). The parameters are defined in a way similar to Python, which can be specified in sequence or in name=value format. Considering the nature of the TIK code generator, we can use some tricks when writing TIK functions but ensure that the Python syntax rules are not violated. Otherwise, the interpreter syntax check will fail.

TIK Entry Point Function

Each TIK operator needs a TIK entry point function, which is called at operator runtime. The declaration of the TIK entry point function is as follows:

def operationname(input_x1, input_x2, output_y, attribute1=None, attribute2=None,..., kernel_name="KernelName")

The name of the entry point function must be the same as that of the Python operator implementation file. For details about the naming rules, see Naming Rules for Operator Definition File.
input_x1, input_x2: input tensors of the operator. A tensor must be defined in dictionary format, including the shape, ori_shape, format, ori_format, and dtype information. See the following example:
dict input_x1 = {'shape' : (2,2), 'ori_shape' : (2,2), 'format': 'ND', 'ori_format':'ND', 'dtype' : 'float16'}

The sequence and number of input tensors must be the same as those in Operator Prototype Definition. Optional inputs also need to be defined here. The compute logic determines whether data is transferred and processed accordingly.
output_y: reserved. A dictionary for the output tensor of the operator, including the shape and dtype information.
The sequence and number of output tensors must be the same as those in Operator Prototype Definition. Optional inputs also need to be defined here.
attribute1, attribute2...: operator attributes. The sequence and number of operator attributes must be the same as those in Operator Prototype Definition.

Ignore this parameter if the operator does not have attributes; assign the default values for this parameter if the attributes are optional.
kernel_name: unique name of the operator in the kernel, that is, the name of the generated binary file and operator description file. The value can contain a maximum of 200 characters starting with a letter or underscore (_) and must be a combination of letters, digits, and underscores (_).

Other TIK functions can be directly called in the TIK entry point function to generate the CCE operator of the TIK operator.

TIK Class Constructor

A TIK class constructor is used to create a TIK DSL container and return a TIK instance as follows:

from tbe import tik
from tbe.common.platform import set_current_compile_soc_info
# Set this parameter based on the Ascend AI Processor version.
soc_version="xxx"
set_current_compile_soc_info(soc_version,core_type="AiCore")
tik_instance = tik.Tik()

set_current_compile_soc_info is used to set the Ascend AI Processor version and the type of the target core. If this API is not called, Ascend 310 will be used as the Ascend AI Processor version and AI Core will be used as the target core by default. For details about the API, see set_current_compile_soc_info.
tik_instance = tik.Tik() is used to construct a TIK container and return a TIK instance.
The tik.Tik() function can also be used to determine whether to enable the TIK debugging function and set the printing level of build error message.

TIK Building Function

After the compute logic of the TIK operator is implemented, call the BuildCCE function to build the TIK description language into a binary that can be executed on the Ascend AI Processor. Call it as follows:

tik_instance.BuildCCE(kernel_name="KernelName", inputs=(data_input1_gm,data_input2_gm,), outputs=(data_output_gm,))
or
tik_instance.BuildCCE(kernel_name="KernelName", inputs=[data_input1_gm,data_input2_gm], outputs=[data_output_g]))

inputs and outputs are a list or tuple of Tensors whose scope is scope_gm. The ordering of input and output Tensors must be the same as that of input and output parameters in Operator Prototype Definition and TIK Entry Point Function.

Smart Use of TIK Functions

from tbe import tik
VLENINT32 = 64

def tik_func(tik_instance, data_input_ub, data_output_ub):
    tik_instance.vec_add(VLENINT32, data_output_ub, data_input_ub, data_output_ub, 4, 8, 8, 8)

def tik_vadd(data_input_gm, data_output_gm, dtype, kernel_name):
    tik_instance = tik.Tik()
    # Define GM and UB.
    data_input_gm = tik_instance.Tensor(dtype, (256,), name="data_input_gm", scope=tik.scope_gm)
    data_output_gm = tik_instance.Tensor(dtype, (256,), name="data_output_gm", scope=tik.scope_gm)
    data_input_ub = tik_instance.Tensor(dtype, (256,), name="data_input_ub", scope=tik.scope_ubuf)
    data_output_ub = tik_instance.Tensor(dtype, (256,), name="data_output_ub", scope=tik.scope_ubuf)

    # Perform data movement.
    tik_instance.data_move(data_input_ub, data_input_gm, 0, 1, 32, 0, 0)
    # Perform Vector computation.
    tik_func(tik_instance, data_input_ub, data_output_ub)
    # Perform data movement.
    tik_instance.data_move(data_output_gm, data_output_ub, 0, 1, 32, 0, 0)

    # Generate CCE.
    tik_instance.BuildCCE(kernel_name="tik_vadd", inputs=(data_input_gm,), outputs=(data_output_gm,))
    return tik_instance

if __name__ == "__main__":
    dtype = "int32"
    kernel_name = "tik_vadd"
    data_input_gm = None
    data_output_gm = None
    tik_instance = tik_vadd(data_input_gm, data_output_gm, dtype, kernel_name)

In the preceding example, vec_add is extracted and encapsulated into tik_func, which takes a TIK instance (tik_instance).

Given the Python syntax rules, the operands data_input_ub and data_output_ub for Vector computation need to be passed to the call. However, this TIK function does not return data_output_ub. Actually, the data_output_ub variable is modified to be explicitly inlined in the IR.

This shows the nature of the TIK code generator. When the Python interpreter executes the vec_add TIK statement, it generates the corresponding IR, instead of performing computation directly. It means that whether data_output_ub is returned or not makes no difference.

To print a temporary tensor of the operator for debug purposes, you can use a non-intrusive test_tensor, given the Python syntax.

Non-intrusive Test

from tbe import tik

class Test:
    __instance=None
    def __init__(self):
        pass
    @classmethod
    def get_instance(cls):
        if not cls.__instance:
            cls.__instance = Test()
        return cls.__instance
    def initial_test(self, tik_instance, dtype, **kwargs):
        self.data_test = tik_instance.Tensor(dtype, kwargs['test_gm_shape'],
                                             name="data_test", scope=tik.scope_gm)
        self.data_test_ub = tik_instance.Tensor(dtype, kwargs['test_ub_shape'],
                                                name="data_test_ub", scope=tik.scope_ubuf)
    def get_test_gm(self):
        return self.data_test
    def get_test_ub(self):
        return self.data_test_ub

Encapsulate a Test class. Define UB and GM in the test class as required.

from tbe import tik
VLENINT32 = 64

# Test Use
from test_tensor import Test
test = Test.get_instance()

def tik_func(tik_instance, data_input_ub, data_output_ub):
    tik_instance.vec_add(VLENINT32, data_output_ub, data_input_ub, data_output_ub, 4, 8, 8, 8)
    # Test Use
    tik_instance.vec_add(VLENINT32, test.get_test_ub(), data_input_ub, data_output_ub, 4, 8, 8, 8)

def tik_vadd(data_input_gm, data_output_gm, dtype, kernel_name):
    tik_instance = tik.Tik()
    test.initial_test(tik_instance, dtype, test_gm_shape=(256,), test_ub_shape=(256,))
    data_input_gm = tik_instance.Tensor(dtype, (256,), name="data_input_gm", scope=tik.scope_gm)
    data_output_gm = tik_instance.Tensor(dtype, (256,), name="data_output_gm", scope=tik.scope_gm)
    data_input_ub = tik_instance.Tensor(dtype, (256,), name="data_input_ub", scope=tik.scope_ubuf)
    data_output_ub = tik_instance.Tensor(dtype, (256,), name="data_output_ub", scope=tik.scope_ubuf)

    tik_instance.data_move(data_input_ub, data_input_gm, 0, 1, 32, 0, 0)
    tik_func(tik_instance, data_input_ub, data_output_ub)
    tik_instance.data_move(data_output_gm, data_output_ub, 0, 1, 32, 0, 0)
    # Test Use
    tik_instance.data_move(test.get_test_gm(), test.get_test_ub(), 0, 1, 32, 0, 0)

    tik_instance.BuildCCE(kernel_name="tik_vadd",
                          inputs=(data_input_gm,),
                          # Test Use
                          outputs=(data_output_gm, test.get_test_gm(),))
    return tik_instance

if __name__ == "__main__":
    dtype = "int32"
    kernel_name = "tik_vadd"
    data_input_gm = None
    data_output_gm = None
    tik_instance = tik_vadd(data_input_gm, data_output_gm, dtype, kernel_name)

Import the Test class as a global variable and initialize it in the entry point function. In this way, the Test class can be used in the entire operator logic in Python's non-intrusive manner. Note that the "Test Use" comments need to be modified or deleted in your operator delivery. The above makes the Python interpreter execute the TIK statement, to insert a new IR in a proper position to facilitate debugging of the operator. On the contrary, if UB or GM is defined in the entry point function instead of using the encapsulated Test class, the debugging process will be complicated due to the syntax characteristics of Python. In this case, the test GM or UB needs to be passed to every TIK function call, which will introduce huge modification workloads.

The procedure of TIK function debugging will be detailed in the following sections. This example only illustrates a possible debugging solution for your reference.

Wrap-up

TIK functions are essentially Python functions with additional TIK statements. However, due to the nature of the TIK code generator, when the Python interpreter executes a TIK statement, the corresponding IR is generated. Therefore, no TIK variable needs to be returned. However, if the TIK variable is a common variable or TIK expression (of the Expr type), the TIK variable needs to be returned according to the Python syntax.

Parent topic: Operator Code Implementation (TBE TIK)