TIK算子泛化

在TIK简介中，我们仅支持固定的数据输入个数和数据类型，所有的tensor申请的大小，和TIK API参数配置为固定值。为了做到shape的泛化，并支持不同的数据类型输入，一般有两种实现方式：

方式一：将这些tensor空间大小和指令的参数设置为根据输入shape和数据类型计算的动态参数，这样我们就可以做到根据不同的输入shape和数据类型，自动的计算出tensor的申请，循环处理的次数，指令执行的参数，使算子可以适配不同的数据类型。编译的时候将对应的shape作为入参传进去，同一份代码根据不同的输入编译出对应的.o，编译出的.o仅仅解决对应shape计算，运行的时候不需要额外的运行参数，只需要传入输入、输出的地址。
方式二：按照最大的内存需求申请空间，编译的时候将shape范围作为编译参数传入，编译出的.o可以支持一类shape计算，运行的时候除了输入、输出的地址，还需要将输入输出shape作为运行参数传进来。具体使用样例可以参考TIK自定义算子动态Shape专题。

两种实现方式的对比：

方式一：每次编译只能解决对应shape的计算，采用常量化编译的时候可以在编译的时候决定所有tensor的大小，循环顺序和指令的位置，充分利用空间，减少不必要的scalar操作，可以获得极致的性能。
方式二：有限的.o解决所有问题，通过运行参数进行分支判断等操作，但会增加scalar操作，导致性能有所下降。
但是两种方式没有绝对的好坏，开发人员可以根据实际的情况去选择适合自己的编译和运行方式。

另外，对于支持任意维度输入的tensor，如果搬运或者计算指令的地址偏移量需要每一个维度都参与计算，我们可以使用迭代函数来实现任意维度的迭代处理，从而计算出搬运偏移量。

例如，对于reverse算子，需要实现的一个功能是对一个最高输入维度为8维的算子的任意一个或多个轴做逆序，在做shape泛化的时候，就要考虑两个变量：一个是输入的tensor维度不定，一个是需要逆序的轴的位置个数不定。因此，如果直接通过普通的for循环迭代的方法可能要写多个分支来处理。在实际开发过程中，我们可以通过迭代的方式来处理多个维度。例如下面的终止条件是否遍历到最后一轴，如果没有遍历到最后一轴，就叠加for循环，并计算搬运索引；如果遍历到最后一轴，就进行相应的指令计算。

#定义迭代循环函数reverse_big_shape
def reverse_big_shape(self, outer_loop_shape, move_in_index, move_out_index, loop_axis):
"""
Traverse the outer loop of tensor

Parameters
----------
outer_loop_shape:
the shape of outer loop
move_in_index:
index for moving input data from gm to ub
move_out_index:
index for moving output data from ub to gm
loop_axis:
loop index currently traversed

Returns
-------
None
"""
inner_data_num = functools_reduce(lambda x, y: x * y, self.inner_shape)
if loop_axis == 0 and inner_data_num > 32 and self.shape_x[0] < 65536:
    with self.tik_instance.for_range(0, outer_loop_shape[0], block_num=self.outer_shape[0]) as index:
    #根据每次循环的输入刷新指令计算的索引
        move_in_index, move_out_index = self.get_move_index(loop_axis, move_in_index, move_out_index, outer_loop_shape, index)
        #判断是否遍历到最后一轴，没有遍历到最后一轴，就叠加for循环reverse_big_shape
        if len(outer_loop_shape) > 1:
            self.reverse_big_shape(outer_loop_shape[1:], move_in_index, move_out_index, loop_axis + 1)
        #判断是否遍历到最后一轴，遍历到最后一轴，就进行相应的指令计算reverse_last_axis
        else:
        self.reverse_last_axis(move_in_index, move_out_index)
else:
    with self.tik_instance.for_range(0, outer_loop_shape[0]) as index:
    #根据每次循环的输入刷新指令计算的索引
    move_in_index, move_out_index = self.get_move_index(loop_axis, move_in_index, move_out_index, outer_loop_shape,index)
    #根据每次循环的输入刷新指令计算的索引
    if len(outer_loop_shape) > 1:
        self.reverse_big_shape(outer_loop_shape[1:], move_in_index, move_out_index, loop_axis + 1)
        #判断是否遍历到最后一轴，遍历到最后一轴，就进行相应的指令计算reverse_last_axis
    else:
    self.reverse_last_axis(move_in_index, move_out_index)


#定义刷新指令计算get_move_index
def get_move_index(self, loop_axis, move_in_index, move_out_index, outer_loop_shape, index):
"""
Get the offset of reading and writing UB

Parameters
----------
loop_axis:
the number of the axis currently traversed
move_in_index:
the offset to read data from data_x_gm
move_out_index:
the offset to write data to data_x_ub
outer_loop_shape:
the outer loop shape of the current traversal
index:
current traversed index

Returns
-------
move_in_index:
the offset to read data from data_x_gm
move_out_index:
the offset to write data to data_x_ub
"""
if loop_axis in self.axis:
    move_in_index = move_in_index * outer_loop_shape[0] + outer_loop_shape[0] - 1 - index
else:
    move_in_index = move_in_index * outer_loop_shape[0] + index
    move_out_index = move_out_index * outer_loop_shape[0] + index

return move_in_index, move_out_index

父主题： 算子代码实现（TIK开发方式）