API List

Table 1 TIK API overview

Function

API

Description

TIK container management

TIK Constructor

Creates a TIK DSL container.

BuildCCE

Generates a TIK description language defined on the target machine and builds the TIK description language into a binary that is executable on Ascend AI Processor and the corresponding configuration file.

Data definition

Tensor

Defines a Tensor variable.

Scalar

Defines a Scalar variable.

InputScalar

Defines a scalar variable as the InputScalar type for operator implementation. In this case, the inputs argument of type InputScalar passed to the BuildCCE call is the value obtained at operator run time.

Scalar management

set_as

Sets the value of a Scalar.

Tensor management

reshape

Reshapes a Tensor.

reinterpret_cast_to

Reinterprets a Tensor into a specified data type.

Obtaining Partial Tensor Data

Obtains partial data of a Tensor by the index.

Changing the Tensor Content

Sets the content of a Tensor by index.

shape

Obtains the shape of a Tensor.

set_as

Sets a Tensor.

Program control

if_scope

Specifies the if_scope code block to be executed if a specified condition of TIK is true.

elif_scope

Specifies the current elif_scope code block to be executed if previous if_scope and elif_scope are not true and the current elif_scope is true.

else_scope

Specifies the code block in the else_scope statement to be executed if previous if_scope and elif_scope statements are not true.

for_range

Indicates the for loop statement of the TIK. N buffers and multiple blocks can be enabled in the for loop.

new_stmt_scope

Indicates a new scope (equivalent to the curly brackets in C language). The disable_sync parameter specifies whether to automatically insert a synchronization instruction in the current scope.

any

Returns False if all arguments are false; True if one of the arguments is true.

all

Returns True if all arguments are true; False if one of the arguments is false.

negate

Returns True if the argument is false; False otherwise.

tik_continue

Skips the current loop to enter the next one. Used in TIK's for_range loop.

tik_break

Terminates the for_range loop statement, that is, stopping executing the current loop.

tik_return

Returns at the instruction or layer position as required, which is often used to set a breakpoint.

Function debugging

start_debug

TIK performs debugging in the simulation environment using the tik.tikdb object. Similar to the Python Debugger (PDB), the tikdb supports setting breakpoints, single-step debugging, and printing variables.

debug_print

The tik.tikdb object defines the debug_print statement to facilitate data printing at operator run time. When the debugger executes this line of code, it evaluates the expression and prints the result on the screen.

set_printf_params

Sets printf parameters.

printf

Prints the value of a Scalar, an Expr, a ScalarArray, or a Tensor. This API is supported even in the functional debugging environment.

Scalar computation (single-operand)

scalar_abs

Obtains the absolute value of a Scalar:

scalar_sqrt

Extracts the square root of a Scalar:

scalar_countbit0

Counts the number of bits whose values are 0 in the 64-bit binary format of the source operand bitwise.

scalar_countbit1

Counts the number of bits whose values are 1 in the 64-bit binary format of the source operand bitwise.

scalar_countleading0

Counts the number of consecutive bits whose values are 0 in the 64-bit binary format of the source operand.

scalar_conv

Converts the data type of a Scalar.

Scalar computation (dual-operand)

scalar_max

Compares two source operands and returns the maximum:

scalar_min

Compares two source operands and returns the minimum:

Vector computation (single-operand)

vec_relu

Performs ReLU element-wise:

vec_abs

Computes the absolute value element-wise:

vec_not

Performs bitwise NOT element-wise:

vec_exp

Computes the natural exponential element-wise:

vec_expm1_high_preci

Computes the natural base element-wise: . This API has a higher precision than vec_exp.

vec_ln

Computes the natural logarithm element-wise:

vec_ln_high_preci

Computes the natural logarithm element-wise: This API has a higher precision than vec_ln.

vec_rec

Computes the reciprocal element-wise:

vec_rec_high_preci

Computes the reciprocal element-wise: This API has a higher precision than vec_rec.

vec_rsqrt

Computes the reciprocal after extracting the square root element-wise:

vec_rsqrt_high_preci

Computes the reciprocal after extracting the square root element-wise: This API has a higher precision than vec_rsqrt.

Vector computation (dual-operand)

vec_add

Performs addition element-wise:

vec_sub

Performs subtraction element-wise:

vec_mul

Performs multiplication element-wise:

vec_max

Computes the maximum element-wise:

vec_min

Computes the minimum element-wise:

vec_and

Performs bitwise AND element-wise:

vec_or

Performs bitwise OR element-wise:

Vector computation > Scalar dual-operand

vec_adds

Performs addition between a vector and a scalar element-wise:

vec_muls

Performs multiplication between a vector and a scalar element-wise:

Vector computation > Scalar triple-operand

vec_axpy

Performs multiplication-accumulation between a vector and a scalar element-wise.

Vector computation >Comparison selection

vec_cmpv_xx

Compares two tensors by returning the truth value element-wise to the corresponding bits of dst. Multiple comparison modes are supported.

vec_sel

Selects elements based on sel bitwise. If a bit is 1, the corresponding element in src0 is selected; if a bit is 0, src1. The selections are recorded as dst_temp and then filtered by mask. The left bits are set to the result dst, and the filtered bits retain dst's original value.

Vector computation > Data precision conversion

vec_conv

Converts the src tensor with one data type to the dst tensor with a different data type.

Vector computation > Pair reduction

vec_cpadd

Adds elements (odd and even) between adjacent pairs:

Vector computation > Reduction

vec_reduce_add

Adds all input data. Each two data pieces are added in binary tree mode.

vec_reduce_max

Obtains the maximum value and its corresponding index position among the input data.

vec_reduce_min

Obtains the minimum value and its corresponding index position among the input data.

Matrix computation

conv2d

Performs 2D convolution on an input tensor and a weight tensor and outputs a result tensor.

fixpipe

Processes the matrix compute result, for example, adding an offset to and quantizing the compute result, and moving the data from the L1OUT Buffer to the Global Memory.

matmul

Multiplies tensor a by tensor b and outputs a result tensor.

Data conversion

vec_trans

Transposes contiguous blocks of a 16 x 16 2D matrix for repeat_times times. Each iteration operates 256 contiguous address space data blocks. The addresses between different iterations can be incontiguous. The address space between adjacent iterations is specified by dst_rep_stride and src_rep_stride.

vec_trans_scatter

Converts NCHW into NC1HWC0. If the data type is float32, int32, uint32, int16, unint16, or float16, then C0 is 16. If the data type is uint8 or int8, then C0 is 32.

Data padding

vec_dup

Copies a Scalar variable or an immediate for multiple times and fill it in the vector (PAR indicates the degree of parallelism):

Data movement

data_move

Moves data between src and dst. Both src and dst can be Tensors at the same time.