Debugging a Vector Operator on the Board
This section shows how to use msDebug to debug a vector operator on the board. The vector operator can add two vectors and output the result.
Prerequisites
- Click Link to obtain a sample project for operator debugging.
- Configure environment variables by referring to Before You Start.
Procedure
- Compile the operator based on the sample project and obtain the executable file add.fatbin.
- Modify the COMPILER_FLAG compilation option in sample/normal_sample/vec_only/Makefile. Change -O2 to -O0 -g --cce-ignore-always-inline=true to enable the compiler debugging function.
# Makefile ... COMPILER := $(ASCEND_HOME_PATH)/compiler/ccec_compiler/bin/ccec COMPILER_FLAG := -xcce -O0 -g --cce-ignore-always-inline=true -std=c++17 # Enable compiler debugging.
- Compile the operator.
In non-initial scenarios, the make clean && make command can be used to replace the make command.
cd ./sample/normal_sample/vec_only/ make
- Modify the COMPILER_FLAG compilation option in sample/normal_sample/vec_only/Makefile. Change -O2 to -O0 -g --cce-ignore-always-inline=true to enable the compiler debugging function.
- Set a breakpoint.
- Start msDebug to start the operator program and enter the debugging page.
1 2 3 4
msdebug add.fatbin (msdebug) target create "add.fatbin" Current executable set to '/home/mindstudio/projects/mstt/sample/build/add.fatbin' (aarch64). (msdebug)
- In this sample, the implementation code of the kernel function is stored in add_kernel.cpp. Set NPU breakpoints in this file for required code lines.
1 2 3 4
(msdebug) b add_kernel.cpp:69 Breakpoint 1: where = device_debugdata`::add_custom(uint8_t *, uint8_t *, uint8_t *) + 18804 [inlined] KernelAdd::Compute(int) + 5144 at add_kernel.cpp:69:9, address = 0x0000000000004974 (msdebug)
- Start msDebug to start the operator program and enter the debugging page.
- Run the operator program.The program starts to run until the first breakpoint (add_kernel.cpp:69) is hit. msDebug detects that the NPU core function add_custom starts to run on device 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(msdebug) run Process 730254 launched [Launch of Kernel add_custom on Device 0] Process 730254 stopped [Switching to focus on Kernel add_custom, CoreId 13, Type aiv] * thread #1, name = 'add.fatbin', stop reason = breakpoint 2.1 frame #0: 0x0000000000004974 device_debugdata`::add_custom(uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x000000000019a930, progress=0) at add_kernel.cpp:69:9 66 // call Add instr for computation 67 Add(zLocal, xLocal, yLocal, TILE_LENGTH); 68 // enque the output tensor to VECOUT queue -> 69 outQueueZ.EnQue<int16_t>(zLocal); # Breakpoint Position 70 // free input tensors for reuse 71 inQueueX.FreeTensor(xLocal); 72 inQueueY.FreeTensor(yLocal); (msdebug)
- Review information.
- Run the ascend info cores command to query NPU core information.
1 2 3 4 5 6 7 8 9 10 11
(msdebug) ascend info cores CoreId Type Device Stream Task Block PC Exception * 13 aiv 0 3 0 0 0x1240c0034974 f0000000 14 aiv 0 3 0 1 0x1240c0034974 f0000000 15 aiv 0 3 0 2 0x1240c0034974 f0000000 20 aiv 0 3 0 3 0x1240c0034974 f0000000 21 aiv 0 3 0 4 0x1240c0034974 f0000000 22 aiv 0 3 0 5 0x1240c0034974 f0000000 23 aiv 0 3 0 6 0x1240c0034974 f0000000 24 aiv 0 3 0 7 0x1240c0034974 f0000000 (msdebug)
- Run the print command to print variable information.
1 2
(msdebug) print progress (int32_t) $0 = 0
- Run the print and memory read commands together to print values stored in the tensor variable.
- Print the data stored in LocalTensor in the UB memory.
For details about the start address for printing the UB memory, see the bufferAddr parameter in the address_ field of the LocalTensor variable. The following uses the xLocal variable as an example. The start address of the memory is 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(msdebug) print xLocal (AscendC::LocalTensor<short>) $0 = { address_ = (dataLen = 256, bufferAddr = 0, bufferHandle = "", logicPos = '\t') shapeInfo_ = { shapeDim = '\0' originalShapeDim = '\0' shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) dataFormat = ND } } (msdebug) memory read -m UB -f int16_t[] 0 -s 256 -c 1 0x00000000: {0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127} (msdebug)
- Print the data stored in GlobalTensor in the GM.
For details about the start address for GM memory printing, see the address_ field of the GlobalTensor variable. The following uses the xGm variable as an example. The start address of the memory is 0x00001240c0015000.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(msdebug) print xGm (AscendC::GlobalTensor<short>) $0 = { bufferSize_ = 2048 shapeInfo_ = { shapeDim = '\0' originalShapeDim = '\0' shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) dataFormat = ND } address_ = 0x00001240c0015000 } (msdebug) memory read -m GM -f int16_t[] 0x00001240c0015000 -s 256 -c 1 0x1240c0015000: {0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127}
- Print the data stored in LocalTensor in the UB memory.
- Switch to another AIV core and print the required information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
(msdebug) ascend aiv 24 // Select the core ID corresponding to block 7 in ascend info cores. In this example, the core ID is 24. [Switching to focus on Kernel add_custom, CoreId 24, Type aiv] * thread #1, name = 'add.fatbin', stop reason = breakpoint 2.1 frame #0: 0x0000000000004974 device_debugdata`::add_custom(uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x00000000001c6930, progress=0) at add_kernel.cpp:69:9 66 // call Add instr for computation 67 Add(zLocal, xLocal, yLocal, TILE_LENGTH); 68 // enque the output tensor to VECOUT queue -> 69 outQueueZ.EnQue<int16_t>(zLocal); ^ 70 // free input tensors for reuse 71 inQueueX.FreeTensor(xLocal); 72 inQueueY.FreeTensor(yLocal); (msdebug) p xLocal (AscendC::LocalTensor<short>) $0 = { address_ = (dataLen = 256, bufferAddr = 0, bufferHandle = "", logicPos = '\t') shapeInfo_ = { shapeDim = '\0' originalShapeDim = '\0' shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0) dataFormat = ND } } (msdebug) memory read -m UB -f int16_t[] 0 -s 256 -c 1 0x00000000: {14336 14337 14338 14339 14340 14341 14342 14343 14344 14345 14346 14347 14348 14349 14350 14351 14352 14353 14354 14355 14356 14357 14358 14359 14360 14361 14362 14363 14364 14365 14366 14367 14368 14369 14370 14371 14372 14373 14374 14375 14376 14377 14378 14379 14380 14381 14382 14383 14384 14385 14386 14387 14388 14389 14390 14391 14392 14393 14394 14395 14396 14397 14398 14399 14400 14401 14402 14403 14404 14405 14406 14407 14408 14409 14410 14411 14412 14413 14414 14415 14416 14417 14418 14419 14420 14421 14422 14423 14424 14425 14426 14427 14428 14429 14430 14431 14432 14433 14434 14435 14436 14437 14438 14439 14440 14441 14442 14443 14444 14445 14446 14447 14448 14449 14450 14451 14452 14453 14454 14455 14456 14457 14458 14459 14460 14461 14462 14463} (msdebug)
- Run the ascend info cores command to query NPU core information.
- Query and delete breakpoints to resume program execution.
1 2 3 4 5 6 7 8 9 10 11 12 13
(msdebug) breakpoint list Current breakpoints: 1: name = 'main', locations = 1, resolved = 1, hit count = 1 1.1: where = add.fatbin`main + 36 at main.cpp:39:12, address = 0x0000aaaaaab0f568, resolved, hit count = 1 2: file = 'add_kernel.cpp', line = 69, exact_match = 0, locations = 1, resolved = 1, hit count = 1 2.1: where = device_debugdata`::add_custom(uint8_t *, uint8_t *, uint8_t *) + 18804 [inlined] KernelAdd::Compute(int) + 5144 at add_kernel.cpp:69:9, address = 0x0000000000004974, resolved, hit count = 1 (msdebug) breakpoint delete 2 1 breakpoints deleted; 0 breakpoint locations disabled. (msdebug) continue Process 730254 resuming 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Process 730254 exited with status = 0 (0x00000000)
- After the debugging is complete, run the q command and enter Y or y to end the debugging.
1 2
(msdebug) q Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
Parent topic: Typical Cases