Board Debugging on the NPU
Debugging with DumpTensor and printf
The functions of printing board data on the NPU include DumpTensor and printf. DumpTensor is used to print the data of a specified tensor, and printf is used to print scalar and string information.
Usage:
- In the following example of DumpTensor, srcLocal indicates the tensor to be printed, 5 indicates the additional custom information, such as the current code line number, and dataLen indicates the number of elements. For details about the usage and restrictions of the DumpTensor API, see DumpTensor.
1DumpTensor(srcLocal,5, dataLen);
During dump, the corresponding DumpHead (32 bytes) is added before the dump information of each block core to record the core ID and resource usage. DumpTensorHead (32 bytes) is also added before the tensor data to be dumped each time to record tensor information. An example of the printing result is as follows:
1 2 3 4 5 6 7 8 9 10
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000, 42.000000] DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32 [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000, 53.000000] ... DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000, 29.000000]
- The following is an example of the printf printing. For details about the usage and restrictions of the printf API, see printf.
1printf("fmt string %d", 0x123);
Using msSanitizer to Detect Exceptions
msSanitizer is an exception detection tool based on Ascend AI Processor. It provides memory check, contention check, uninitialization check, and synchronization check in single-operator development scenarios.
- Memory check: During operator development, the tool can locate memory problems such as illegal read/write, multi-core corruption, non-aligned access, memory leak, and illegal release. In addition, the tool can detect the memory of the CANN software stack, helping users locate the module with memory exception in the software stack.
- Contention check: The tool helps users locate data contention problems that may be caused by contention risks, including intra-core contention and inter-core contention. Intra-core contention includes inter-pipeline contention and intra-pipeline contention.
- Uninitialization check: The tool helps users locate dirty data read problems that may be caused by uninitialized memory.
- Synchronization check: The tool helps users locate synchronization failures in subsequent operators due to unpaired synchronization instructions in the preceding operators.
For details, see msSanitizer (Anomaly Detection).
This function is supported only in the following scenarios:
- Call operators through the method in Kernel Launch Based on a Sample Project.
- Call operators through single-operator APIs.
- Call a single-operator API (aclnnxxx) indirectly (single-operator calling in the PyTorch framework).
Using msDebug for Operator Debugging
msDebug is an operator debugging tool for Ascend devices. It is used to debug operator programs running on NPUs and provides debugging methods for operator developers. msDebug can debug all Ascend operators, including Ascend C operators (Vector, Cube, and fused operators). The specific functions include breakpoint setting, variable and memory printing, single-step debugging, running interruption, core switching, program status check, debugging information display, and core dump file parsing. You can select the functions as needed. For details, see msDebug (Operator Debugging).
- Call operators through the method in Kernel Launch Based on a Sample Project.
- Call operators through single-operator APIs.
- Call a single-operator API (aclnnxxx) indirectly (single-operator calling in the PyTorch framework).