Overview

msDebug is an operator debugging tool for Ascend devices. It is used to debug operator programs running on NPUs and provides debugging methods for operator developers. The debugging methods include reading the memory and register of an Ascend device, and pausing and resuming the running status of a program. After testing the operator functions in a real-world hardware environment by starting operators or using the msOpST tool, you can determine whether to use the msDebug tool for function debugging based on the actual test situation.

  • To enable msDebug, install the NPU driver and firmware using either of the following methods (method 1 is recommended for CANN 8.1.RC1 and later, and driver 25.0.RC1 and later):
    • Method 1: Specify the --full parameter during driver installation, and then run the echo 1 > /proc/debug_switch command as the root user to enable the debugging channel. Then the msDebug tool can be used.
      ./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --full
    • Method 2: Specify the --debug parameter during driver installation. For details, see "NPU Driver and Firmware Installation".
      ./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --debug
  • The debugging channel has high permissions, which causes security risks. Exercise caution when using this tool. This tool is not recommended in the production environment. If you use this tool, you implicitly accept the risks involved.

Functions

The msDebug tool can debug all Ascend operators, including Ascend C operators (Vector, Cube, and mix fused operators). You can select an Ascend operator as needed. For details, see Table 1.

Table 1 msDebug functions

Function

Link

Breakpoint settings

Setting Breakpoints

Variable and memory printing

Printing Memory and Variables

Step-by-step debugging

Executing Step-by-Step Debugging

Running interruption

Interrupting Execution

Core switchover

Switching Cores

Program status check

Reading Register Values

Debugging information display

Displaying the Debugging Information

Core dump file parsing

Analyzing the Dump File of an Exception Operator

  • After you enter CTRL+C, the operator execution stops, and the tool generates a profile data file based on existing information. If you do not need to generate the file, enter CTRL+C again.
  • If the --output parameter is not specified, the current tool execution path is used by default. Ensure that users in the group and other groups do not have the write permission on the parent directory of the current path.

Commands

  • You need to ensure the execution security of executable files or applications.
    • You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
    • You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
  • You can run the help command to view all the commands supported by msDebug. Commands excluded in Table 2 are implemented by the open-source debugger LLDB. Pay attention to related risks when using LLDB. For details about how to use LLDB, see its official document at https://lldb.llvm.org/.
Table 2 Command reference

Command

Abbreviation

Description

Example

breakpoint set -f filename -l linenum

b

Adds a breakpoint. filename indicates the operator implementation code file *.cpp, and linenum is the line number of the code file.

b add_custom.cpp:85

run

r

Runs a program.

r

continue

c

Keeps running.

c

print variable

p

Prints variables.

p zLocal

frame variable

var

Displays all local variables in the current scope.

var

memory read

x

Reads the memory.

x -m GM -f float16[] 0x00001240c0037000 -c 2 -s 128
  • -m specifies the memory location. GM, UB, L0A, L0B, L0C, L1, FB, STACK, DCACHE, and ICACHE are supported.
    NOTE:

    STACK, DCACHE, and ICACHE are used only in Analyzing the Dump File of an Exception Operator.

  • -s specifies the number of bytes to be printed in each line.
  • -c specifies the number of lines to be printed.
  • -f specifies the type of the data to be printed.
  • 0x00001240c0037000 indicates the memory address to be read. Change it based on the actual situation.

ascend info devices

-

Queries device information.

ascend info devices

ascend info cores

-

Queries information about the AI Core running on an operator.

ascend info cores

ascend info tasks

-

Queries information about the task running on an operator.

ascend info tasks

ascend info stream

-

Queries information about the stream running on an operator.

ascend info stream

ascend info blocks

-

Queries information about the block running on an operator.

To print information about running blocks:
ascend info blocks 
To print the code of the running blocks at the current breakpoint:
ascend info blocks -d

ascend aic id

-

Switches the target cube core of the debugger.

ascend aic 1

ascend aiv id

-

Switches the target vector core of the debugger.

ascend aiv 5

CTRL+C

-

Manually interrupts the operator execution program and displays the interrupted location information.

Enter information using the keyboard.

register read

re r

Reads register values. -a reads all register values. $REG_NAME reads the value of a specified register.

register read -a
re r $PC

thread step-over

next or n

Moves to the next executable line of code in the same call stack.

n

thread step-in

step or s

Switches to the internal function for debugging.

s

thread step-out

finish

Executes the remaining part of the function and returns to the main program.

finish

thread backtrace

bt

Displays the code call stack information.

NOTE:
  • The bt command is applicable only to the core dump feature scenario. Accurate call stack information can be ensured only when stop_reason is CUBE_ERROR, CCU_ERROR, MTE_ERROR, VEC_ERROR, or FIXP_ERROR.
  • If the displayed function name is too long, you can set it by referring to link.
    setting set frame-format "frame #${frame.index}: ${frame.pc}{ ${module.file.basename}{{${frame.no-debug}${function.pc-offset}}}}{ at ${line.file.basename}:${line.number}{:${line.column}}}{${function.is-optimized} [opt]}{${frame.is-artificial} [artificial]}\n"
bt

target modules add <kernel.o>

image add [kernel.o]

Imports operator debugging information when the PyTorch framework calls operators.

NOTE:

After the run command is executed, run the image add command to import the debugging information. Then, run the image load command for the imported debugging information to take effect.

image add xx.o       

target modules load --file <kernel.o> --slide <address>

image load -f <kernel.o> -s <address>

Loads the operator debugging information for the imported debugging information to take effect when the PyTorch framework calls an operator.

image load -f xx.o -s 0

msdebug --core corefile [kernel.o|fatbin]

-

  • Loads the core dump file.
  • The second parameter is optional. You need to enter the path of the executable binary file in kernel.o or fatbin format compiled with -g to display the call stack for the code lines.
msdebug --core corefile xx.o
msdebug --core corefile

ascend info summary

-

Displays the core dump file information.

ascend info summary

help msdebug_command

-

Displays the help information. The command output displays the function, syntax, and options of a command.

help run
Help information for core switchover:
(msdebug) help ascend aic
change the id of the focused ascend aicore.
Syntax: ascend aic <id>
Help information for ascend info blocks:
(msdebug) help ascend info blocks
show blocks overall info.
Syntax: ascend info blocks
Command Options Usage:
  ascend info blocks [-d]
       -d ( --details )
            Show stopped states for all blocks.

Call Scenarios

The following operator call scenarios are supported:

Supplementary Notes

msDebug also provides the following extension program. For details, see Table 3.

Table 3 Description of the extension program

Program Name

Description

msdebug-mi (msDebug Machine Interface)

Provides a machine-machine interaction interface for data parsing, which can be ignored by users.