TIK Statements
What Is a TIK Statement?
In the previous section, we talked about the concept of expressions. A TIK expression does not generate an IR, but a TIK statement does. When a TIK statement contains an expression, the expression is instantiated for computation.
This section describes the flexibility of combining TIK and Python statements during TIK operator programing to meet different operator requirements and the statement features determined at compile time and execution time.
A statement is a unit of code that has an effect, like creating a variable or displaying a value.
TIK statements include assignment statements, iteration statements, condition statements, compute statements, print statements, and container management statements.
TIK statements exist in the form of common Python language elements. TIK API can directly or indirectly affect the TIK DSL program, which distinguishes TIK APIs from other Python programs.
In addition to the TIK API, you might also use common Python language elements in front-end TIK programing, that is, Python classes and collections. TIK frontend contains the TIK statements and Python statements. In some cases, this can improve the coding efficiency and prevent bugs.
Smart Use of Iteration Statements
for_range is used to complete TIK iterations. To use the iteration variable i in a TIK statement, the for_range statement must be used. You can also enable double buffering to optimize the performance.
Case 1
The following TIK operator code snippet involves four tensor definitions and four vec_dup instructions.
input_ub0 = tik_instance.Tensor("int32", (64,), name="input_ub0", scope=tik.scope_ubuf)
input_ub1 = tik_instance.Tensor("int32", (64,), name="input_ub1", scope=tik.scope_ubuf)
input_ub2 = tik_instance.Tensor("int32", (64,), name="input_ub2", scope=tik.scope_ubuf)
input_ub3 = tik_instance.Tensor("int32", (64,), name="input_ub3", scope=tik.scope_ubuf)
tik_instance.vec_dup(64, input_ub0, 0, 1, 8)
tik_instance.vec_dup(64, input_ub1, 0, 1, 8)
tik_instance.vec_dup(64, input_ub2, 0, 1, 8)
tik_instance.vec_dup(64, input_ub3, 0, 1, 8)
The preceding definitions and instruction statements need to be defined repeatedly. Tweaking of the variables in the statements is prone to errors. This also breaks the open–closed principle.
However, introducing Python iteration statements can double the effect with half the effort.
repeat_times = 4
input_ub_list = []
for i in range(repeat_times):
tmp_ub = tik_instance.Tensor("int32", (64,), name="input_ub"+str(i), scope=tik.scope_ubuf)
input_ub_list.append(tmp_ub)
tik_instance.vec_dup(64, input_ub_list[i], 0, 1, 8)
An input_ub_list list is defined to store tensor variables. Then, the Python iteration statement is used to iterate the TIK statement in the iteration. Each time a TIK statement is iterated, an IR is generated, resulting the same effect as repeated code lines.
In a TIK operator, the Python interpreter executes statements in sequence based on the code logic. Each time a TIK statement is executed, an IR is generated.
This TIK front-end programing method is easy to expand. You simply need to change the value of repeat_times in the example to get different effect.
TIK is actually a code generator. The Python interpreter emits an instruction to generate the corresponding IR only when executing a TIK statement. Therefore, TIK statements are obviously different from Python statements.
Smart Use of Conditional Statements
if_scope and else_scope are used for selections within the operator compute logic. During TIK operator programing, Python conditional statements are also useful. Take case 1 for extension.
Case 2
need_clear = False
repeat_times = 4
input_ub_list = []
for i in range(repeat_times):
tmp_ub = tik_instance.Tensor("int32", (64,), name="input_ub"+str(i), scope=tik.scope_ubuf)
input_ub_list.append(tmp_ub)
for i in range(repeat_times):
if need_clear:
tik_instance.vec_dup(64, input_ub_list[i], 0, 1, 8)
else:
tik_instance.vec_dup(64, input_ub_list[i], 1, 1, 8)
When using an operator, the user needs to set the runtime and compile-time parameters. The compile-time parameters are known at compile time. Therefore, it is possible to reduce the CCE code volume of a TIK operator compiled by applying Python conditional statements to determine which TIK statements will be executed and which will not, to generate corresponding IR.
Assume that need_clear and repeat_times are compile-time parameters, which need to be provided when you build the TIK operator. According to the preceding rules, only the TIK statements executed by the Python interpreter generate IRs.
Properly use Python conditional statements and properly route TIK selections based on the operator compile-time parameters to reduce redundant code generated at TIK build time and improve the operator performance.
Compile-time Parameters and Runtime Parameters
The phase for determining the compile-time parameters is different from that for determining the runtime parameters. A defined TIK operator is compiled according to the configured compile-time parameters. The compiled CCE operator needs to be further compiled into an .o file by using the CCE Compiler (CCEC) before the operator .o file can be run with runtime parameters specified. Note that the operation defined by a TIK statement does not occur when Python is called. Instead, it occurs when the program moves on from the definition phase to the compilation phase. The call only generates the corresponding IR.
Compile-time parameters are static parameters. You can use Python statements to process them in the TIK operator. The inputs and outputs parameters in BuildCCE are runtime parameters.
Case 3: compile-time parameters
VLENFP32 = 64
# model_point_num is static.
model_point_num = 150
# model_point_repeat and model_point_residual can be directly calculated during TIK-to-IR conversion.
model_point_repeat = model_point_num // VLENFP32
model_point_residual = model_point_num % VLENFP32
input_ub = tik_instance.Tensor("float32", (256,), name="input_ub", scope=tik.scope_ubuf)
output_ub = tik_instance.Tensor("float32", (256,), name="output_ub", scope=tik.scope_ubuf)
# Use a Python conditional statement.
if model_point_repeat != 0:
tik_instance.vec_add(VLENFP32, output_ub, input_ub, output_ub, model_point_repeat, 8, 8, 8)
# Use a Python conditional statement.
if model_point_residual != 0:
tik_instance.vec_add(model_point_residual, output_ub[model_point_repeat*VLENFP32:], input_ub[model_point_repeat*VLENFP32:], input_ub[model_point_repeat*VLENFP32:], 1, 8, 8, 8)
In the entry point function of a given operator, model_point_num is a static compile-time parameter. When the TIK statement is converted into IR, it is assigned with a particular value. VLENFP32 indicates the maximum times of 64 x float32 elements that Vector Unit can compute in parallel (up to 256 bytes), depending on the data type.
The Python conditional statements are used to control which TIK statements need to be executed and which do not. In this way, the corresponding CCE operator is generated.
See the following examples:
- model_point_num = 32, model_point_repeat = 0: The corresponding logic selection will not be generated.
- model_point_num = 128, model_point_residual = 0: The corresponding logic selection will not be generated.
In the Python program of TIK, the Python code (non-TIK API part) is closely related to the constantization of the operator. In terms of presentation, the configuration property compute is performed by using common Python variables. In this way, different CCE operators can be compiled based on different input parameters.
Case 4: runtime parameters
The most intuitive runtime parameter is the float32 value stored in input_ub in the case. The value cannot be obtained before the operator is executed. Therefore, if the logic of the operator needs to be specified by the runtime parameter, the logic cannot be omitted in the operator. The following is an example:
VLENFP32 = 64
# model_point_ub is a runtime parameter.
model_point_ub = tik_instance.Tensor("float32", (64,), name="input_ub", scope=tik.scope_ubuf)
# model_point_num is obtained from model_point_ub. The value of model_point_num is unknown when the TIK operator is compiled.
model_point_num = tik_instance.Scalar("int32", "model_point_num", init_value=model_point_ub[0])
# The values of model_point_repeat and model_point_residual are also unknown.
model_point_repeat = model_point_num // VLENFP32
model_point_residual = model_point_num % VLENFP32
input_ub = tik_instance.Tensor("float32", (256,), name="input_ub", scope=tik.scope_ubuf)
output_ub = tik_instance.Tensor("float32", (256,), name="output_ub", scope=tik.scope_ubuf)
# Python conditional statements do not work here.
with tik_instance.if_scope(model_point_repeat != 0):
tik_instance.vec_add(VLENFP32, output_ub, input_ub, output_ub, model_point_repeat, 8, 8, 8)
# Python conditional statements do not work here.
with tik_instance.if_scope(model_point_residual != 0):
tik_instance.vec_add(model_point_residual, output_ub[model_point_repeat*VLENFP32:],input_ub[model_point_repeat*VLENFP32:], input_ub[model_point_repeat*VLENFP32:], 1, 8, 8, 8)
Because model_point_num is obtained from the model_point_ub parameter at run time and its exact value cannot be obtained at compile time, whether the if statement is executed is unknown. Therefore, Python conditional statements do not work here. Instead, TIK conditional statements should be used.
According to case 4, the generated CCE code is long because the value of model_point_num is necessary for determining the subsequent selection at run time. In this way, the runtime parameters are more flexible, at the cost of performance.
Is it possible to use TIK statements for compile-time parameters? The answer is yes, but redundant CCE code will also be a problem. Try to move the parameter determination forward as possible to maximize the operator performance.
Summary
Python statements are executed in sequence based on the sequential logic, iteration logic, or condition logic. When the Python interpreter executes a TIK statement, an IR is generated. Although Python statements cannot directly generate IRs, they function as the controller for generating IRs.
When selecting between TIK and Python statements during coding, you must consider when the statement takes effect, in the build phase or in the execution phase. Python statements are recommended for the build phase while TIK statements are recommended for the execution phase.
If you can smartly combine Python and TIK statements when programing TIK operators, you can certainly avoid most bugs, improve your programing efficiency, and effectively optimize your code, thereby maximizing your operators' performance.
Exercise
In case 3, what conditions does model_point_num need to meet so that both selections are taken? So that the first selection is taken? So that the second selection is taken?
[Key]
- model_point_num is greater than 64 and not exactly divisible by 64.
- model_point_num is exactly divisible by 64 and but not 0.
- model_point_num∈[0,64)