Quick Start
Objectives
This section uses a simple Add operator as an example to describe how to write a TIK operator.
In this example, the two inputs of the Add operator are both with shape (128,) and data type float16. The operator will run on the AI Core of the Ascend AI Processor.
Operator Analysis
During operator analysis, specify the mathematical expression, inputs, outputs, scheduling design, and implementation function name of the operator.
- Specify the mathematical expression of the Add operator as follows:
y=x1+x2
The Add operator adds two inputs and returns the result.
- Specify the inputs and output.
- The Add operator has two inputs and one output.
- The supported input data type is float16, and so is the output data type.
- The supported input shape is (128,), and so is the output shape.
- The operator input supports the following formats: NCHW, NC1HWC0, NHWC, and ND.
You do not need to pay too much attention to the format during operator implementation, as the TIK mode is insensitive to the format.
- Design the scheduling of the Add operator.
- Define data in the storage inside and outside the AI Core.
- Transfer data from the external storage to the Unified Buffer in the AI Core. The shape of the inputs is (128,) and the data type is float16. The size of the inputs is 128 x 2 = 256 bytes, which is far less than the size of the Unified Buffer (for example, the size of the Unified Buffer is 256 KB for the Ascend 310 AI Processor). Therefore, only one transfer is needed.
- Perform the calculation.
Call the vec_add API to perform addition.
The Vector Unit can calculate a maximum of 256-byte data at a time. In this example, the input size is 256 bytes. Therefore, with all Vector Units used, only one calculation is needed.
- Specify the operator implementation file name, operator implementation function name, and OpType.
- Name OpType in upper camel case and separate words with a single capitalized letter.
- Name the operator implementation file and operator definition function in either of the following ways:
- To create user-defined names, configure opFile.value and opInterface.value in TBE Operator Information Library.
- If opFile.value and opInterface.value in the TBE Operator Information Library are not configured, FE obtains the operator file name and function name by replacing the OpType as follows.The rules are as follows:
- Replace the first uppercase letter with a lowercase letter.
- Replace each uppercase letter following lowercase letters with an underscore (_) and the corresponding lowercase letter.
- Uppercase letters following a digit or an uppercase letter are regarded as a semantic string. If there is a lowercase letter after this string, replace the last uppercase letter in this string with an underscore (_) and the corresponding lowercase letter, and replace the other uppercase letters with corresponding lowercase letters. If there is no lowercase letter after the string, directly replace the string with lowercase letters.
Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef
In this example, OpType of the operator is defined as AddTik and both the operator implementation file name and implementation function name add_tik.
Based on the preceding analysis, the design specifications of the Add operator are as follows.
Table 1 Add operator's design specifications OpType
AddTik
Operator Input
Name: x1
Shape:
(128,)
Data type:
float16
Format:
NCHW, NC1HWC0,
NHWC, ND
Name: x2
Shape:
(128,)
Data type:
float16
Format:
NCHW, NC1HWC0,
NHWC, ND
Operator Output
Name: y
Shape:
(128,)
Data type:
float16
Format:
NCHW, NC1HWC0,
NHWC, ND
TIK Compute API
vec_add
Implementation Function Name
add_tik
Operator Code Implementation
The following uses the AddTik operator as an example to describe how to write a TIK program.
- Import Python modules.
from tbe import tik import tbe.common.platform as tbe_platform import numpy as np
tbe.tik: provides all TIK-related Python functions. For details, see python/site-packages/tbe/tik in the CANN component directory.
- Define the operator implementation function.
def add_tik():
Note: To make it easy to follow, this sample uses static data type and shape. In practice, however, data is not fed until the operator is executed. As such, the basic information about the input and output must be included when defining the operator implementation function, for example, TIK Entry Point Function.
- Set the Ascend AI Processor version and specify the target to run the operator.
# Set soc_version to the Ascend AI Processor version in use. tbe_platform.set_current_compile_soc_info(soc_version)core_type in the set_current_compile_soc_info API specifies the core type. The default value is AiCore, indicating that the target to run the operator is the AI Core.
- Create a TIK DSL container.
tik_instance = tik.Tik(disable_debug=False)
In this example, function debugging is further required. Therefore, set disable_debug to False.
- Insert TIK DSL statements into the created container.
- Define the input and output data in the external and internal storage of the AI Core.
data_A = tik_instance.Tensor("float16", (128,), name="data_A", scope=tik.scope_gm) data_B = tik_instance.Tensor("float16", (128,), name="data_B", scope=tik.scope_gm) data_C = tik_instance.Tensor("float16", (128,), name="data_C", scope=tik.scope_gm) data_A_ub = tik_instance.Tensor("float16", (128,), name="data_A_ub", scope=tik.scope_ubuf) data_B_ub = tik_instance.Tensor("float16", (128,), name="data_B_ub", scope=tik.scope_ubuf) data_C_ub = tik_instance.Tensor("float16", (128,), name="data_C_ub", scope=tik.scope_ubuf) - Move data in the external storage to the internal storage (such as the Unified Buffer) of the AI Core.
tik_instance.data_move(data_A_ub, data_A, 0, 1, 128*2 //32, 0, 0) tik_instance.data_move(data_B_ub, data_B, 0, 1, 128*2 //32, 0, 0) - Perform addition.
tik_instance.vec_add(128, data_C_ub[0], data_A_ub[0], data_B_ub[0], 1, 8, 8, 8)
- Move data from the internal storage of the AI Core to the external storage.
tik_instance.data_move(data_C, data_C_ub, 0, 1, 128*2 //32, 0, 0)
- Define the input and output data in the external and internal storage of the AI Core.
- Compile the statements in the TIK DSL container into the code that can run on Ascend AI Processor, that is, the operator .o file and .json file (operator description file).
tik_instance.BuildCCE(kernel_name="simple_add",inputs=[data_A,data_B],outputs=[data_C])
In the preceding code:
- kernel_name: indicates the kernel name of the function in the generated binary code.
- inputs: stores data loaded from the external storage as the input tensors of the program. The data type must be that required by the Global Memory.
- outputs: stores compute results to be moved to the external storage as the output tensors of the program. The data type must be that required by the Global Memory.
- Return a TIK instance.
return tik_instance
Debugging
- Append the following debugging code to the operator implementation file for operator implementation verification.
if __name__ == "__main__": # Call the TIK operator implementation function. tik_instance = add_tik() # Initialize data to a 1D matrix composed of 128 float16 ones. data = np.ones((128,), dtype=np.float16) feed_dict = {"data_A": data, "data_B": data} # Start debugging. data_C, = tik_instance.tikdb.start_debug(feed_dict=feed_dict, interactive=True) # Print output data. print(data_C) - Run the TIK Python program.
python3 add_tik.py
The inputs data_A and data_B are 1D matrices of 128 float16 ones. The output data_C is as follows:
[TIK]>c [2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
The preceding describes only the general programming procedure. The involved APIs and parameters will be detailed in subsequent sections.