上板调试模板库的算子
展示如何使用msDebug工具来上板调试一个模板库算子(matmul),该算子可实现两个矩阵相乘并输出结果的功能。
操作步骤
- 基于前提条件中的样例工程编译算子,获取可执行文件00_basic_matmul。
- 修改example/CMakeLists.txt中的BISHENG_COMPILER_OPTIONS编译选项,将-O2修改为-O0 -g,并注释优化选项的-DL2_CACHE_HINT,使能编译器调试功能。
# example/CMakeLists.txt ... set(BISHENG_COMPILER_OPTIONS -O0 -g -std=c++17 -xcce ... #-DL2_CACHE_HINT # 使能编译器调试功能
- 执行以下命令完成算子编译,编译完成后,在build/bin目录下生成可执行文件00_basic_matmul。
1
bash ./scripts/build.sh 00_basic_matmul
- 修改example/CMakeLists.txt中的BISHENG_COMPILER_OPTIONS编译选项,将-O2修改为-O0 -g,并注释优化选项的-DL2_CACHE_HINT,使能编译器调试功能。
- 启动msDebug工具拉起算子程序,进入调试界面。
1 2 3 4
msdebug ./build/bin/00_basic_matmul 256 512 1024 0 (msdebug) target create "./build/bin/00_basic_matmul" Current executable set to '/home/mindstudio/projects/ascendc-templates/build/bin/00_basic_matmul' (aarch64). (msdebug)
- 设置断点。该用例中核函数的代码实现位于basic_matmul.hpp中,在此文件中,为需要的代码行设置NPU断点。
1 2 3
(msdebug) b basic_matmul.hpp:121 Breakpoint 1: 2 locations. (msdebug)
- 运行算子程序,等待直到命中断点。
程序会开始运行直到命中第一个断点(basic_matmul.hpp:127)后停下,msDebug检测到NPU核函数开始运行,运行在Device 0。
_ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Blo为模板库的kernel名字,示例仅显示前面64位。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(msdebug) run Process 3344307 launched: '/home/mindstudio/projects/ascendc-templates/build/bin/00_basic_matmul' (aarch64) [Launch of Kernel _ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Blo on Device 0] Process 3344307 stopped [Switching to focus on Kernel _ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Blo, CoreId 21, Type aic] * thread #1, name = '00_basic_matmul', stop reason = breakpoint 1.1 frame #0: 0x0000000000001c38 device_debugdata`_ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Block9BlockMmadINS1_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS8_ILj128ELj256ELj64EEENS1_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESG_SG_vNS1_4Tile8TileCopyINS_4Arch7AtlasA2ESG_SG_SG_vEENSH_8TileMmadISK_SG_SG_vEEEEvNS4_24GemmIdentityBlockSwizzleILj3ELj0EEEEEEEvNT_6ParamsE_mix_aic at basic_matmul.hpp:121:71 118 119 for (uint32_t loopIdx = AscendC::GetBlockIdx(); loopIdx < coreLoops; loopIdx += AscendC::GetBlockNum()) { 120 // Compute block location -> 121 GemmCoord blockCoord = matmulBlockScheduler.GetBlockCoord(loopIdx); 122 GemmCoord actualBlockShape = matmulBlockScheduler.GetActualBlockShape(blockCoord); 123 124 // Compute initial location in logical coordinates (msdebug)
- 检视信息。
- 使用ascend info cores命令查询NPU核信息。
1 2 3 4 5 6 7
(msdebug) ascend info cores CoreId Type Device Stream Task Block PC stop reason * 21 aic 0 48 0 0 0x12c0c00d6c38 breakpoint 1.1 22 aic 0 48 0 1 0x12c0c00d6c38 breakpoint 1.1 23 aic 0 48 0 2 0x12c0c00d6c38 breakpoint 1.1 24 aic 0 48 0 3 0x12c0c00d6c38 breakpoint 1.1 (msdebug)
- 使用print命令直接打印gmA变量信息。
1 2 3 4 5 6 7 8 9
(msdebug) print gmA (AscendC::GlobalTensor<__fp16>) $0 = { AscendC::BaseGlobalTensor<__fp16> = { address_ = 0x000012c0c0013000 oriAddress_ = 0x000012c0c0013000 } bufferSize_ = 0 cacheMode_ = CACHE_MODE_NORMAL }
- 继续使用memory read命令可打印出gmA变量中存放的值。
- 打印位于GM内存上的gmA中存放的数据。
1 2 3
(msdebug) memory read -m GM 0x12c0c0013000 -f float16[] -s 256 -c 1 0x12c0c0013000: {3.40234 -1.05664 2.83008 2.98438 4.11719 -3.02539 -1.64746 2.68164 -2.22266 0.539551 -0.226074 1.28906 -1.35254 0.134033 4.52344 4.16016 1.35742 2.17383 -3.58398 1.06934 -4.83594 -2.57031 -3.62695 3.04102 -3.43359 -0.990723 -3.70117 -3.91211 4.98828 -2.81836 0.129272 3.39062 1.12598 -2.03906 1.37598 0.24292 -0.0641479 4.72656 -2.07422 2.71289 0.267334 2.69922 -0.997559 3.91602 -2.16602 -1.47559 3.07812 4.19141 -4.30078 4.49219 0.26001 -4.14062 -3.07812 1.63184 3.90234 -1.51074 -4.35938 -4.80078 -0.423096 -4.36719 -2.61719 4.70703 4.02344 3.50977 -2.33398 0.397705 -1.24805 2.60156 0.125366 1.67676 0.316162 -4.60547 -0.623535 4.31641 4.30859 2.20898 -2.15625 2.38477 1.39941 -1.45996 1.87891 -3.33984 -0.599121 3.80078 3.29297 -1.69629 -2.71094 3.93359 -1.49609 1.86621 4.56641 0.88623 1.57324 3.58594 -0.604492 4.23828 -1.01562 3.14844 1.8418 4.10938 -0.175049 -2.8418 4.50391 4.20312 -3.52344 3.81055 1.41113 -0.680664 1.19629 -2.18945 2.85938 -1.92578 -0.529785 -2.73828 -3.125 -2.23828 0.564453 -0.834961 -3.30469 4.06641 -3.96875 -3.73828 -0.0455627 2.60547 4.84766 4.35156 1.84473 -1.16797} (msdebug)
- 打印位于GM内存上的gmA中存放的数据。
- 进行核切换,切换至另一个aic核,并打印需要的信息。
1 2 3 4 5 6 7 8 9 10 11 12 13
(msdebug) ascend aic 24 // ascend info cores中选择block 3对应的coreId,此处为24 [Switching to focus on Kernel _ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Blo, CoreId 24, Type aic] * thread #1, name = '00_basic_matmul', stop reason = breakpoint 1.1 frame #0: 0x0000000000001c38 device_debugdata`_ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Block9BlockMmadINS1_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS8_ILj128ELj256ELj64EEENS1_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESG_SG_vNS1_4Tile8TileCopyINS_4Arch7AtlasA2ESG_SG_SG_vEENSH_8TileMmadISK_SG_SG_vEEEEvNS4_24GemmIdentityBlockSwizzleILj3ELj0EEEEEEEvNT_6ParamsE_mix_aic at basic_matmul.hpp:121:71 118 119 for (uint32_t loopIdx = AscendC::GetBlockIdx(); loopIdx < coreLoops; loopIdx += AscendC::GetBlockNum()) { 120 // Compute block location -> 121 GemmCoord blockCoord = matmulBlockScheduler.GetBlockCoord(loopIdx); 122 GemmCoord actualBlockShape = matmulBlockScheduler.GetActualBlockShape(blockCoord); 123 124 // Compute initial location in logical coordinates (msdebug) p loopIdx (uint32_t) $1 = 0
- 使用ascend info cores命令查询NPU核信息。
- 查询并删除断点,恢复程序运行。
1 2 3 4 5 6 7 8 9 10 11
(msdebug) breakpoint list Current breakpoints: 1: file = 'basic_matmul.hpp', line = 121, exact_match = 0, locations = 2, resolved = 2, hit count = 1 1.1: where = device_debugdata`_ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Block9BlockMmadINS1_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS8_ILj128ELj256ELj64EEENS1_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESG_SG_vNS1_4Tile8TileCopyINS_4Arch7AtlasA2ESG_SG_SG_vEENSH_8TileMmadISK_SG_SG_vEEEEvNS4_24GemmIdentityBlockSwizzleILj3ELj0EEEEEEEvNT_6ParamsE_mix_aic + 4748 [inlined] _ZN7Catlass4Gemm6Kernel11BasicMatmulINS0_5Block9BlockMmadINS0_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS7_ILj128ELj256ELj64EEENS0_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESF_SF_vNS0_4Tile8TileCopyINS_4Arch7AtlasA2ESF_SF_SF_vEENSG_8TileMmadISJ_SF_SF_vEEEEvNS3_24GemmIdentityBlockSwizzleILj3ELj0EEEEclILi1EEEvRKNSQ_6ParamsE_mix_aic + 4632 at basic_matmul.hpp:121:71, address = 0x0000000000001c38, resolved, hit count = 1 1.2: where = device_debugdata`_ZN7Catlass13KernelAdapterINS_4Gemm6Kernel11BasicMatmulINS1_5Block9BlockMmadINS1_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS8_ILj128ELj256ELj64EEENS1_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESG_SG_vNS1_4Tile8TileCopyINS_4Arch7AtlasA2ESG_SG_SG_vEENSH_8TileMmadISK_SG_SG_vEEEEvNS4_24GemmIdentityBlockSwizzleILj3ELj0EEEEEEEvNT_6ParamsEm_mix_aic + 4772 [inlined] _ZN7Catlass4Gemm6Kernel11BasicMatmulINS0_5Block9BlockMmadINS0_19MmadAtlasA2PingpongILb1EEENS_9GemmShapeILj128ELj256ELj256EEENS7_ILj128ELj256ELj64EEENS0_8GemmTypeIDhNS_6layout8RowMajorELN7AscendC9TPositionE0EEESF_SF_vNS0_4Tile8TileCopyINS_4Arch7AtlasA2ESF_SF_SF_vEENSG_8TileMmadISJ_SF_SF_vEEEEvNS3_24GemmIdentityBlockSwizzleILj3ELj0EEEEclILi1EEEvRKNSQ_6ParamsE_mix_aic + 4632 at basic_matmul.hpp:121:71, address = 0x000000000000dd54, resolved, hit count = 0 (msdebug) breakpoint delete 1 1 breakpoints deleted; 0 breakpoint locations disabled. (msdebug) continue Process 3344307 resuming Compare success. Process 3344307 exited with status = 0 (0x00000000)
- 调试完以后,执行q命令并输入Y或y结束调试。
(msdebug) q
父主题: 典型案例