单步调试
用户输入n后,可将运行模式改为单核运行模式,即只有聚焦的核运行,其他核静止。
前提条件
算子编译时,使用--cce-ignore-always-inline=true的编译选项。
操作步骤
- 将断点打在需要调试的位置,并运行。打断点的具体操作请参见行断点。
(msdebug) r // 运行 Process 2695700 launched: '${INSTALL_DIR}/projects/reduce_sum/add_tik2_npu' (aarch64) [Launch of Kernel _Z17reduce_sum_customPhS_S_S_ on Device 0] Process 2695700 stopped [Switching to focus on Kernel _Z17reduce_sum_customPhS_S_S_, CoreId 0, Type aiv] * thread #1, name = 'add_tik2_npu', stop reason = breakpoint 1.1 frame #0: 0x0000000000001390 device_debugdata`reduce_sum_custom(unsigned char*, unsigned char*, unsigned char*, unsigned char*) [inlined] KernelReduceSum::Compute(this=0x0000000000167258) at reduce_sum_custom.cpp:45:36 42 } 43 __aicore__ inline void Compute() 44 { -> 45 LocalTensor<half> xLocal = inQueueX.DeQue<half>(); // 断点位置 46 LocalTensor<half> yTmpLocal = yTmp.Get<half>(); 47 LocalTensor<half> workTmpLocal = workTmp.Get<half>(); 48 LocalTensor<int32_t> syncTmpLocal = syncTmp.Get<int32_t>();
- 用户输入n后,msdebug工具将运行模式改为单核运行模式。
(msdebug) n Process 2695700 stopped [Switching to focus on Kernel _Z17reduce_sum_customPhS_S_S_, CoreId 0, Type aiv] * thread #1, name = 'add_tik2_npu', stop reason = step over // 通过回显可查看pc的位置,表示单步成功 frame #0: 0x000000000000183c device_debugdata`reduce_sum_custom(unsigned char*, unsigned char*, unsigned char*, unsigned char*) [inlined] KernelReduceSum::Compute(this=0x0000000000167258) at reduce_sum_custom.cpp:46:44 43 __aicore__ inline void Compute() 44 { 45 LocalTensor<half> xLocal = inQueueX.DeQue<half>(); -> 46 LocalTensor<half> yTmpLocal = yTmp.Get<half>(); 47 LocalTensor<half> workTmpLocal = workTmp.Get<half>(); 48 LocalTensor<int32_t> syncTmpLocal = syncTmp.Get<int32_t>(); 49 LocalTensor<half> secondTmpLocal = secondTmp.Get<half>();
- 输入ascend info cores命令,查看所有核的PC信息和停止原因 。
(msdebug) ascend info cores CoreId Type Device Stream Task Block PC stop reason * 0 aiv 0 47 0 2 0x1240c001c83c step over //* 代表当前正在运行的核 1 aiv 0 47 0 3 0x1240c001c390 breakpoint 1.1 2 aiv 0 47 0 4 0x1240c001c390 breakpoint 1.1 3 aiv 0 47 0 5 0x1240c001c390 breakpoint 1.1 4 aiv 0 47 0 6 0x1240c001c390 breakpoint 1.1 5 aiv 0 47 0 7 0x1240c001c390 breakpoint 1.1 48 aiv 0 47 0 0 0x1240c001c390 breakpoint 1.1 49 aiv 0 47 0 1 0x1240c001c390 breakpoint 1.1
- 当前核的停止原因既有单步调试又有断点时,将展示为breakpoint。
- 若运行程序出现卡顿的现象,可以通过键盘输入“CTRL+C”中断运行程序 。运行卡顿的原因可能是以下情况:
- 用户程序本身存在死循环,需要通过修复程序解决。
- 算子使用了表1中的同步类指令。
- 使用核切换功能,调试其他核 。
(msdebug) ascend aiv 2 [Switching to focus on Kernel _Z17reduce_sum_customPhS_S_S_, CoreId 2, Type aiv] * thread #1, name = 'add_tik2_npu', stop reason = step over frame #0: 0x0000000000001390 device_debugdata`reduce_sum_custom(unsigned char*, unsigned char*, unsigned char*, unsigned char*) [inlined] KernelReduceSum::Compute(this=0x000000000016f258) at reduce_sum_custom.cpp:45:36 42 } 43 __aicore__ inline void Compute() 44 { -> 45 LocalTensor<half> xLocal = inQueueX.DeQue<half>(); 46 LocalTensor<half> yTmpLocal = yTmp.Get<half>(); 47 LocalTensor<half> workTmpLocal = workTmp.Get<half>(); 48 LocalTensor<int32_t> syncTmpLocal = syncTmp.Get<int32_t>(); (msdebug) n Process 2695700 stopped [Switching to focus on Kernel _Z17reduce_sum_customPhS_S_S_, CoreId 2, Type aiv] * thread #1, name = 'add_tik2_npu', stop reason = step over frame #0: 0x000000000000183c device_debugdata`reduce_sum_custom(unsigned char*, unsigned char*, unsigned char*, unsigned char*) [inlined] KernelReduceSum::Compute(this=0x000000000016f258) at reduce_sum_custom.cpp:46:44 43 __aicore__ inline void Compute() 44 { 45 LocalTensor<half> xLocal = inQueueX.DeQue<half>(); -> 46 LocalTensor<half> yTmpLocal = yTmp.Get<half>(); 47 LocalTensor<half> workTmpLocal = workTmp.Get<half>(); 48 LocalTensor<int32_t> syncTmpLocal = syncTmp.Get<int32_t>(); 49 LocalTensor<half> secondTmpLocal = secondTmp.Get<half>(); (msdebug) ascend info cores CoreId Type Device Stream Task Block PC stop reason 0 aiv 0 47 0 2 0x1240c001c83c step over 1 aiv 0 47 0 3 0x1240c001c390 breakpoint 1.1 * 2 aiv 0 47 0 4 0x1240c001c83c step over // 用户输入n后,stop reason才会展示为stepover 3 aiv 0 47 0 5 0x1240c001c390 breakpoint 1.1 4 aiv 0 47 0 6 0x1240c001c390 breakpoint 1.1 5 aiv 0 47 0 7 0x1240c001c390 breakpoint 1.1 48 aiv 0 47 0 0 0x1240c001c390 breakpoint 1.1 49 aiv 0 47 0 1 0x1240c001c390 breakpoint 1.1
- 调试完以后,执行q命令并输入Y或y结束调试。
(msdebug) q Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
父主题: 程序执行