Overview
To debug an Ascend C operator, you need to first call the Ascend C class library to compile the kernel source code of the Ascend C operator, compile the kernel source code by using the GCC to generate common binary files on the CPU, and use the GDB debugging tool for debugging. In addition, you need to compile the kernel source code by using the BiSheng compiler to generate binary files on the NPU, and use the simulation dotting or Profiling tool to collect board data for debugging.

The following table lists the debugging and tuning methods and tools.
Category |
Subcategory |
Method |
|---|---|---|
Function debugging |
CPU twin debugging |
Twin debugging: The same operator code can be used to debug accuracy on the CPU and performance on the NPU. In the CPU domain, you can perform GDB debugging and use the printf command to print information. |
Board debugging on the NPU |
printf/assert: printf is used to print scalar and string information. assert is used to set checkpoints in the code. When a condition is not met, the program terminates immediately and reports an error. |
|
DumpTensor: Use the DumpTensor API to print the data of a specified tensor. |
||
Board debugging tool: Use the msDebug tool to debug the operator program running on the NPU. In a real hardware environment, test the input and output of the operator to verify whether the operator functions properly. The specific functions include setting breakpoints, printing variables and memory, performing single-step debugging, and interrupting execution. |
||
Memory check tool: Use the msSanitizer tool to check the memory. It can detect and report memory access exceptions such as out-of-bounds and misalignment of the external storage (global memory) and internal storage (local memory) during operator running. |
||
Performance tuning |
- |
msProf tool: The msProf performance analysis tool is used to collect and analyze key performance metrics of operators running on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks of operators based on the output performance data, thereby enhancing the overall efficiency of operator performance analysis. Performance data can currently be collected and automatically parsed based on various running modes (onboard or simulation) and file formats (executable files or operator binary .o files). |