TIK Scope

What Is a TIK Scope?

A TIK scope determines where and when a TIK variable is valid or active. A TIK scope is equivalent to the lifetime of a TIK variable.

The lifetime of a TIK variable begins from its allocation and ends with its destruction at the end of the code block. A TIK variable is active and accessible only during its lifetime.

To access a TIK variable, all scopes are iterated from the innermost scope to the outermost scope to locate the variable. If the variable is not found in any scope, an undefined error is reported.

Scope of a Scalar

The scope or lifetime of a Scalar complies with the following rules:

The lifetime of a Scalar variable begins from its allocation and ends with its destruction at the end of the code block.
A Scalar variable is active and accessible only during its lifetime.

Figure 1 shows an example of the scopes or lifetimes of Scalar variables S0, S1, and S2.

Figure 1 Scalar lifetime diagram

In TIK, new scopes are introduced when functions, for loops, and if judgments are defined. External access to variables defined in the scopes is not allowed. See the following example.

g_scalar = tik_instance.Scalar(dtype = "int16", init_value = 1)                    # Global scope. g_scalar is globally accessible.
def tik_func_1():
    func_scalar = tik_instance.Scalar(dtype = "int16", init_value = 2)             # func_scalar is accessible within the scope of the tik_func_1 function. Here, g_scalar and func_scalar are accessible.
    with tik_instance.if_scope(func_scalar < 5):                                    
       if_scalar = tik_instance.Scalar(dtype = "int16", init_value = 3)            # if_scalar is accessible within the if scopes. Here, g_scalar, func_scalar, and if_scalar are accessible.
            with tik_instance.for_range(0, 4) as i:                                 
                for_scalar = tik_instance.Scalar(dtype = "int16", init_value = 4)   # for_scalar is accessible within the for scopes. Here, g_scalar, func_scalar, if_scalar, and for_scalar are accessible.

Scope of a Tensor

The scope or lifetime of a Tensor complies with the following rules:

The lifetime of a Tensor variable begins from its allocation and ends with its destruction at the end of the code block.
A Tensor variable is active and accessible only during its lifetime.
At any time, the total buffer size of active tensors must be within the total size of the physical buffers.

Figure 2 shows a Tensor scope/lifetime diagram.

Figure 2 Tensor lifetime diagram

In the preceding example, the code bock spans five time segments, as numbered from 1 to 5. Table 1 lists the active Tensor variables in each time segment and the total UB size.

**Table 1** UB size occupied by active Tensors in each time segment
Time Segment	Active Tensors	UB Usage
1	B0	256 x 2 bytes
2	B0, B1	256 x 2 x 2 bytes
3	B0, B1, B2	256 x 3 x 2 bytes
4	B0, B1, B3	256 x 3 x 2 bytes
5	B0, B4	256 x 2 x 2 bytes

In actual development, the shape size of the input tensor may exceed the upper limit of the Unified Buffer. In this case, more than one movement is required for computation. Therefore, to maximize the Unified Buffer usage, you are advised to set the shape size to the maximum allowed by the Unified Buffer in the data definition. A code example is as follows.

# Obtain the Unified Buffer size in bytes.
ub_size_bytes = tbe.common.platform.get_soc_spec("UB_SIZE")           # Set the UB size, 128 bytes for example.
 
# In the Unified Buffer, data must be read and written in the unit of 32-byte blocks.
block_byte_size = 32
 
# Calculate the number of elements each block can hold based on the input data type dtype_x.
def get_bit_len(dtype):
    index = 0
    for i in dtype:
        if i.isdigit():
            break
        index += 1
    return int(dtype[index:])

dtype_bytes_size = get_bit_len(dtype_x) // 8   # Convert the bits into bytes. The input is of type int16. An int16 element is 2 bytes long.
data_each_block = block_byte_size // dtype_bytes_size         # A block can hold 16 (32/2) elements.
 
# Calculate the space to be allocated in the Unified Buffer and perform 32-byte alignment.
ub_tensor_size = (ub_size_bytes // dtype_bytes_size //        # ub_tensor_size = 128 // 2 // 16 * 16 = 64
data_each_block * data_each_block)                            # A Tensor can contain up to 64 int16 elements.
 
# Create a tensor input_x_ub in the Unified Buffer.
input_x_ub = tik_instance.Tensor(dtype_x, (ub_tensor_size,), 
name="input_x_ub", scope=tik.scope_ubuf)

Another technique that maximizes the Unified Buffer usage is address overlapping. Use single-input Vector computation as an example. When address reuse permits, you can define a Tensor that can be reused by both the source and destination operands, which halves the memory allocation.

'''
Example:
tensor_a is the source operand and destination operand.
The following statement computes the absolute values of tensor_a:
before: tensor_a = [-3, -2, -1, 0, 1, 2, 3, ...]
after: tensor_a = [3, 2, 1, 0, 1, 2, 3, ...]
'''
tensor_a = tik_instance.Tensor("int16", (128,), name="tensor_a", scope=tik.scope_ubuf)
tik_instance.vec_abs(128, tensor_a, tensor_a, 1, 8, 8)

Note that a compile error is reported if the required allocation exceeds the total capacity of the corresponding memory type due to address alignment.

See the following figure. Assume that the UB capacity is 1024 bytes and an instruction requires a tensor_a Tensor of size 1008 bytes and a tensor_b Tensor of size 16 bytes within the UB scope. According to UB's 32-byte address alignment requirement, the 1008-byte Tensor is padded to 1024 bytes and the 16-byte Tensor is padded to 32 bytes. In this case, the required allocation adds up to 1056 bytes, which exceeds the UB capacity (1024 bytes).

Figure 3 Memory capacity exceeded after address alignment

The start address of a user-defined Tensor is aligned during memory allocation. The following table describes the alignment requirements of different scopes.

**Table 2** Alignment restrictions for different scopes
Scope	Alignment Requirement
Unified Buffer	Atlas 200/300/500 Inference Product : 32-byte aligned Atlas Training Series Product : 32-byte aligned
L1 Buffer	512-byte aligned
L1OUT Buffer	512-byte aligned for float16; 1024-byte aligned for float32, int32, and uint32
Global Memory	No alignment requirement

These general memory alignment requirements apply to the destination and source operands passed to the TIK data compute and data transfer API calls, unless otherwise specified in the description of particular TIK APIs.

Parent topic: Operator Code Implementation (TBE TIK)