展示如何使用msDebug工具来上板调试一个PyTorch接口调用的add算子,该add算子可实现两个向量相加并输出结果的功能。
操作步骤
- 执行以下命令,可生成自定义算子工程,并提供了host侧和kernel侧的算子实现。
| bash install.sh -v Ascendxxxyy # xxxyy为用户实际使用的具体芯片类型
|
- 在${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp目录下修改CMakePresets.json文件的cacheVariables的配置项,将"Release"修改为"Debug"。
| "cacheVariables": {
"CMAKE_BUILD_TYPE": {
"type": "STRING",
"value": "Debug"
},
|
- 参考算子编译部署,完成算子的编译部署。
- 进入到样例目录,以命令行方式下载样例代码。参考README使用PyTorch调用方式调用AddCustom算子工程,并按照指导完成编译。
PyTorch接入工程的样例工程目录如下:
| PytorchInvocation
├── op_plugin_patch
├── run_op_plugin.sh // 5.执行样例时,需要使用
└── test_ops_custom.py // 步骤7启动工具时,需要使用
└── test_ops_custom_register_in_graph.py // 执行torch.compile模式下用例脚本
|
cd ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/PytorchInvocation
- 执行样例,样例执行过程中会自动生成测试数据,然后运行PyTorch样例,最后检验运行结果。
bash run_op_plugin.sh
-- CMAKE_CCE_COMPILER: ${INSTALL_DIR}/toolkit/tools/ccec_compiler/bin/ccec
-- CMAKE_CURRENT_LIST_DIR: ${INSTALL_DIR}/AddKernelInvocation/cmake/Modules
-- ASCEND_PRODUCT_TYPE:
Ascendxxxyy
-- ASCEND_CORE_TYPE:
VectorCore
-- ASCEND_INSTALL_PATH:
/usr/local/Ascend/ascend-toolkit/latest
-- The CXX compiler identification is GNU 10.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: ${INSTALL_DIR}/AddKernelInvocation/build
Scanning dependencies of target add_npu
...
[100%] Built target add_npu
INFO: Ascend C Add Custom SUCCESS
...
INFO: Ascend C Add Custom in torch.compile graph SUCCESS
- 手动导入算子调试信息。
- ${INSTALL_DIR}请替换为CANN软件安装后文件存储路径。若安装的Ascend-cann-toolkit软件包,以root安装举例,则安装后文件存储路径为:/usr/local/Ascend/ascend-toolkit/latest。
- 非Atlas A3 训练系列产品/Atlas A3 推理系列产品:在安装昇腾AI处理器的服务器执行npu-smi info命令进行查询,获取Chip Name信息。实际配置值为AscendChip Name,例如Chip Name取值为xxxyy,实际配置值为Ascendxxxyy。当Ascendxxxyy为代码样例的路径时,需要配置为ascendxxxyy。
- Atlas A3 训练系列产品/Atlas A3 推理系列产品:在安装昇腾AI处理器的服务器执行npu-smi info -t board -i id -c chip_id命令进行查询,获取Chip Name和NPU Name信息,实际配置值为Chip Name_NPU Name。例如Chip Name取值为Ascendxxx,NPU Name取值为1234,实际配置值为Ascendxxx_1234。当Ascendxxx_1234为代码样例的路径时,需要配置为ascendxxx_1234。
其中:
- id:设备id,通过npu-smi info -l命令查出的NPU ID即为设备id。
- chip_id:芯片id,通过npu-smi info -m命令查出的Chip ID即为芯片id。
(msdebug)export LAUNCH_KERNEL_PATH=${INSTALL_DIR}/opp/vendors/customize/op_impl/ai_core/tbe/kernel/SOC_VERSION/add_custom/AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o
- 启动msDebug工具拉起Python程序,进入调试界面。
| msdebug python3 test_ops_custom.py
(msdebug) target create "python3"
Current executable set to '/home/mindstudio/miniconda3/envs/py39/bin/python3' (aarch64).
(msdebug) settings set -- target.run-args "test_ops_custom.py"
(msdebug)
|
- 设置断点。
根据指定源码文件与对应行号,在核函数中设置NPU断点。
| (msdebug) b add_custom.cpp:60
Breakpoint 1: where = AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) + 9912 [inlined] KernelAdd::Compute(int) + 3400 at add_custom.cpp:60:9, address = 0x00000000000026b8
|
- 运行程序,等待直到命中断点。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | (msdebug) r
Process 197189 launched: '/home/miniconda3/envs/py39/bin/python3' (aarch64)
Process 197189 stopped and restarted: thread 1 received signal: SIGCHLD
...
[Launch of Kernel anonymous on Device 0]
Process 197189 stopped
[Switching to focus on Kernel anonymous, CoreId 8, Type aiv]
* thread #1, name = 'python3', stop reason = breakpoint 2.1
frame #0: 0x00000000000026b8 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x000000000020efb8, progress=1) at add_custom.cpp:60:9
57 LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y>();
58 LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>();
59 Add(zLocal, xLocal, yLocal, this->tileLength);
-> 60 outQueueZ.EnQue<DTYPE_Z>(zLocal);
61 inQueueX.FreeTensor(xLocal);
62 inQueueY.FreeTensor(yLocal);
63 }
(msdebug)
|
- 删除断点,具体操作请参见删除断点。
- 调试完以后,执行q命令并输入Y或y结束调试。
| (msdebug) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
|