TIK Function Debugging

Overview

After a TIK operator is developed, you are advised to use the TIK debugger to debug the operator. By simulating the operator execution process, the TIK debugger can help you locate most of the functionality errors (for example, data overflow) of the operator.

The TIK debugger is used to debug TIK DSL execution behaviors. The debug function is implemented by the tik.tikdb object of the TIK module. The tik.tikdb object is obtained by using the tik_instance.tikdb() method of the tik.Tik object, and the TIK object is debugged.

tikdb provides a debug command line similar to the Python Debugger (PDB). You can start the debug process by using the tikdb.start_debug() call. (The input parameters of the debugging TIK DSL program need to be specified. For details, see start_debug.) tik.tikdb starts a local simulator based on the Dprofile of the TIK to simulate its execution process. If a breakpoint occurs during the execution, tikdb enters the debug command line. For details, see 4.

Availability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Procedure

Before function debugging, you need to change the setting of disable_debug=True (default) to disable_debug=False when defining a TIK instance. For details about the arguments, see TIK Constructor.
```
tik_instance = tik.Tik(disable_debug=False)
```
Note: Enabling the debugging function affects the operator performance. After the function is debugged, you can delete the disable_debug parameter or manually set disable_debug to True. This reduces compile time and improve performance.
The major debug APIs to use include:
- start_debug: starts debugging and returns the debugging result.
- debug_print: (optional) prints data generated at operator run time. The call inserts a statement in the TIK DSL program to evaluate the expression and prints the result. When the debugger reaches this code line, it evaluates the expression and prints the result on the screen.
For details about more APIs, see Debugging.

Prepare data to debug.

You can use NumPy to generate random numbers or read them from files.

The following is a complete code example. The simple_add.py file is used as an example.

import numpy as np

from tbe import tik
from tbe.common.platform import set_current_compile_soc_info

def simple_add():
    tik_instance = tik.Tik(disable_debug=False)
    kernel_name = "tik_vec_add_128_float32"
    dst_ub = tik_instance.Tensor("float32", [128], tik.scope_ubuf, "dst_ub")
    dst_gm = tik_instance.Tensor("float32", (128,), tik.scope_gm,  "dst_gm")
    src0_gm = tik_instance.Tensor("float32", (128,), tik.scope_gm, "src0_gm")
    src0_ub = tik_instance.Tensor("float32", (128,), tik.scope_ubuf, "src0_ub")
    src1_gm = tik_instance.Tensor("float32", (128,), tik.scope_gm, "src1_gm")
    src1_ub = tik_instance.Tensor("float32", (128,), tik.scope_ubuf, "src1_ub")

    tik_instance.data_move(src0_ub, src0_gm, 0, 1, 16, 0, 0)
    tik_instance.data_move(src1_ub, src1_gm, 0, 1, 16, 0, 0)
    tik_instance.vec_add(64, dst_ub, src0_ub, src1_ub, 2, 8, 8, 8)
    tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
    tik_instance.BuildCCE(kernel_name, [src0_gm, src1_gm], [dst_gm])

    return tik_instance

if __name__=='__main__':
    # Set this parameter based on the Ascend AI Processor version.
    soc_version="xxx"
    set_current_compile_soc_info(soc_version)   # Set the model according to the AI Processor in use. For details, see the API description of set_current_compile_soc_info.
    tik_instance = simple_add()
    data_x = np.ones((128,)).astype("float32")
    data_y = np.ones((128,)).astype("float32")
    feed_dict = {'src0_gm': data_x, 'src1_gm': data_y}
    model_data, = tik_instance.tikdb.start_debug(feed_dict=feed_dict,interactive=True)
    print(model_data)

Run the operator file to go to TIK debugger command line.
1. Configure the environment variables by referring to Environment Setup.
2. Run the operator file.
  Set interactive to True in the tikdb.start_debug() call and run the operator file. The TIK debugger command line is accessed. In this case, the debugger stops at the first TIK DSL statement. See the following figure.
In the interactive command line of the TIK debugger, enter debug commands. For details about available debug commands, see Debug Command Reference.
com(mand) param1 [param2]
- com(mand) is the command name, either command or com.
- An option not enclosed in square brackets is required, for example, param1.
- An option enclosed in square brackets is optional, for example, [param2].
- Considerations:
  - If nothing is input to the command prompt, the previous command is executed again.
  - In debug mode, you can perform single-step debugging, or continue the operation until the next breakpoint is reached or the program ends. If the program functions properly, the difference between the data generated by debugging and the expected data is 0 after the program ends.

Debug Command Reference

The following commands are available in the debugger CLI mode:

block [block_idx1] [block_idx2] ... [block_idxn]
- This command queries the block state and switches between blocks in multi-block scenarios.
- Set block_idxn to the value of block_num.
- To query the block state, run the block command when with tik_instance.for_range() and its subsequent statements are executed.
  - Block indicates the block index.
  - Status indicates the block state, selected from Stepping, Running, and Finished.
  - Current: If True, debugs the current block; if False, otherwise.
  Figure 1 Block info query
- To switch between blocks, run the block block_idx command when with tik_instance.for_range() and its subsequent statements are executed.
  Figure 2 Switching blocks for debugging
b(reak) [tag] [block block_idx1 [block_idx2] ... [block_idxn]]
- This command sets and queries a tag breakpoint in the TIK DSL program.
- Option description:
  - tag indicates a breakpoint in the TIK DSL program, formatted as:
```
Name of the file of the TIK DSL definition function:Line number of the function of the TIK DSL definition function
```
    If no parameter is set, the information about all configured breakpoints is displayed, including the breakpoint index, breakpoint enabling status, and the tag corresponding to the breakpoint. Breakpoints are numbered from 0 in ascending order.
    
    Note that if the breakpoint format is incorrect or the breakpoint statement is not a TIK DSL statement, the system displays a message indicating that the breakpoint fails to be set. Statements that support breakpoint setting include the Tensor definition, if-else statement, for statement, and all instructions (such as data_move and vadd).
  - block is valid only in multi-block scenarios. The block command takes effect on all blocks (in states other than Running and Finished). The block block_idx1 block_idx2... command specifies particular blocks.
  - Take the simple_add.py file as an example:
    - b or break (displays all breakpoints)
      Num indicates the breakpoint index. Type indicates a breakpoint type. Enb indicates the breakpoint status. where indicates the name of the file and the code line number of the breakpoint.
    - b simple_add.py (The argument is invalid.)
    - b simple_add.py:16 (The argument is valid.)
    - b simple_add.py:100 (The line number exceeds the line number range in the file or the code in the line is not to be executed.)
    - b simple_add.py:1 (The code in the line are not TIK primitives.)
    - b others.py:18 (Sets breakpoints for other files.)
    - b others.py:19 block 1 2 (The block option specifies that the breakpoint takes effect only on block 1 and block 2.)
    Figure 3 Breakpoint setting example
clear [bpnumber] [block block_idx1 [block_idx2] ... [block_idxn]]
- This command clears all breakpoints or specific breakpoints.
- Option description:
  - bpnumber indicates a breakpoint number, that is, Num returned by the b(reak) command.
  - block is valid only in multi-block scenarios. The block command takes effect on all blocks (in states other than Running and Finished). The block block_idx1 block_idx2... command specifies particular blocks.
- Take the simple_add.py file as an example:
  - clear (Clears all breakpoints.)
  - clear 1 (The breakpoint exists.)
  - clear 10 (The breakpoint does not exist.)
  - clear cc (The argument is invalid.)
  - clear 1 block 2 3 (Clears breakpoint 1 on block 2 and block 3.)
  Figure 4 Breakpoint clearing example
disable [bpnumber] [block block_idx1 [block_idx2] ... [block_idxn]]
- This command disables all breakpoints or specific breakpoints.
- Option description:
  - bpnumber indicates a breakpoint number, that is, Num returned by the b(reak) command.
  - block is valid only in multi-block scenarios. The block command takes effect on all blocks (in states other than Running and Finished). The block block_idx1 block_idx2... command specifies particular blocks.
- Take the simple_add.py file as an example:
  - disable 2 (The breakpoint exists.)
  - disable 11 (The breakpoint does not exist.)
  - disable aa (The argument is invalid.)
  - disable 1 block 2 3 (Disables breakpoint 1 on block 2 and block 3.)
  Figure 5 Example of disabling a specified breakpoint
enable [bpnumber] [block block_idx1 [block_idx2] ... [block_idxn]]
- This command enables all breakpoints or specific breakpoints.
- Option description:
  - bpnumber indicates a breakpoint number, that is, Num returned by the b(reak) command.
  - block is valid only in multi-block scenarios. The block command takes effect on all blocks (in states other than Running and Finished). The block block_idx1 block_idx2... command specifies particular blocks.
- Take the simple_add.py file as an example:
  - enable 2 (The breakpoint exists.)
  - enable 11 (The breakpoint does not exist.)
  - enable bb (The argument is invalid.)
  - enable 1 block 2 3 (Enables breakpoint 1 on block 2 and block 3.)
  Figure 6 Example of enabling a specified breakpoint
n(ext)
- This command proceeds to the next TIK DSL statement.
- This command has no options.
c(ontinue) [-a(ll)]
- This command continues debugging until the program ends. If a breakpoint is reached or an exception occurs, the system returns to the interactive mode again.
- The -a or -all option takes effect only in multi-block scenarios. The c(ontinue) command takes effect on the current debug process. c(ontinue) -a is c(ontinue) -all takes effect on all processes.
l(ist) or w(here)
- This command prints the Python code and context of the TIK DSL code to be executed.
- This command has no options.
- In Figure 7, a total of seven lines of code are printed.
  Figure 7 Command outputs
p(rint) expression
- This command evaluates the expression and prints the result.
- expression can be any Python expression. The variables that can be used by expression are Tensors and Scalars in the current scope of the TIK DSL program. The Tensor will be replaced with equivalent numpy.ndarray. The shape, type, and data of the NumPy object are the same as those of the Tensor. The Scalar will be evaluated and replaced with a value of the float or int type in Python. expr can also be configured as a character string or a combination of the character string and expression.
- Figure 8 shows an example.
  Figure 8 print example
q(uit)
- This command quits the debugger and terminates the current program.
- This command has no options.
- Figure 9 shows an example.
  Figure 9 quit example

Multi-block Debugging Example

The following is an example of the complete code. The tik_multi_core_debug.py file is used as an example.

import numpy as np

from tbe.common.platform import set_current_compile_soc_info
from tbe import tik


def simple_add_multi_core():
    tik_instance = tik.Tik(disable_debug=False)
    kernel_name = "tik_multi_core_debug"
    dtype = "float16"
    block_nums = 3


    dst_gm = tik_instance.Tensor(dtype, (block_nums*128,), tik.scope_gm, "dst_gm")
    src0_gm = tik_instance.Tensor(dtype, (block_nums*128,), tik.scope_gm, "src0_gm")
    src1_gm = tik_instance.Tensor(dtype, (block_nums*128,), tik.scope_gm, "src1_gm")

    with tik_instance.for_range(0, block_nums, block_num=block_nums) as blk_idx:
        dst_ub = tik_instance.Tensor(dtype, (128,), tik.scope_ubuf, "dst_ub")
        src0_ub = tik_instance.Tensor(dtype, (128,), tik.scope_ubuf, "src0_ub")
        src1_ub = tik_instance.Tensor(dtype, (128,), tik.scope_ubuf, "src1_ub")

        tik_instance.data_move(src0_ub, src0_gm[blk_idx*128], 0, 1, 8, 0, 0)
        tik_instance.data_move(src1_ub, src1_gm[blk_idx*128], 0, 1, 8, 0, 0)
        tik_instance.vec_add(128, dst_ub, src0_ub, src1_ub, 1, 8, 8, 8)
        tik_instance.data_move(dst_gm[blk_idx*128], dst_ub, 0, 8, 1, 0, 0)

    tik_instance.BuildCCE(kernel_name, [src0_gm, src1_gm], [dst_gm])
    return tik_instance


if __name__ == '__main__':
    # Set this parameter based on the Ascend AI Processor version.
    soc_version="xxx"
    set_current_compile_soc_info(soc_version)   # Set the model according to the AI Processor in use. For details, see the API description of set_current_compile_soc_info.
    tik_instance = simple_add_multi_core()
    data_x = np.ones((3*128,)).astype("float16")
    data_y = np.ones((3*128,)).astype("float16")
    feed_dict = {'src0_gm': data_x, 'src1_gm': data_y}
    model_data, = tik_instance.tikdb.start_debug(feed_dict=feed_dict, interactive=True)
    print(model_data)

In multi-block scenarios, before executing the for_range statement (tik_instance.for_range(0, block_nums, block_num=block_nums)), specify the blocks to interact by setting breakpoints or by stepping through code. For stepping through code, all blocks enter the interactive mode by default. For blocks that do not enter the interactive mode, the execution automatically ends.
Figure 10 Putting all blocks in interactive mode by stepping through code

Figure 11 Specifying blocks to interact by setting breakpoints

If no block is specified for a breakpoint, breakpoints are set for all blocks by default.

Figure 12 Stepping through code and executing particular blocks

For stepping through code, code of only the current block is executed.
Test blocks by setting breakpoints.
Before entering the for_range loop: (tik_instance.for_range(0, block_nums, block_num=block_nums)), the breakpoint setting puts all blocks in the interactive mode.

In the for_range loop, the breakpoint setting takes effect for all interactive blocks by default, that is, those in states other than Running and Finished.

If you want to set a breakpoint on a specific block, you can use the block index1 index2 ... option to specify the block.

Figure 13 Setting a breakpoint before entering for_range

Breakpoint 0 takes effect on all blocks. Breakpoint 1 takes effect only on blocks 0 and 1. Breakpoint 2 takes effect only on blocks 1 and 2.

Figure 14 Setting a breakpoint before entering for_range in the multi-block scenario

The breakpoint takes effect only on block 1 and block 2. The execution of c stops at the breakpoint, and the status of block 0 is Finished, indicating that the execution is complete and block 0 does not enter the interactive mode.

Figure 15 Setting a breakpoint after entering the for_range loop in the multi-block scenario

The breakpoint is successfully set for block 1 and block 2. However, the breakpoint setting on block 0 fails because block 0 is in the Finished state.

The commands for disabling, enabling, and clearing breakpoints are similar to the breakpoint setting command. Details are not described herein.

Figure 16 Disabling and enabling breakpoints on blocks

Figure 17 Testing by using the clear command
Test by using the continue and quit commands.
In multi-block interactive mode, the continue command takes effect only on the current block. To take effect on all blocks, run the continue -a, continue -all, c -a, or c -all command.

Run the quit command to exit.

Figure 18 Testing with continue command

Figure 19 Testing with quit command

Parent topic: Operator Code Implementation (TBE TIK)