客户现场发生硬件异常时,需要反复压测复现问题,定位效率低。为了解决该问题,系统检在测到潜在的硬件异常时,会自动触发一个dump操作,捕获当前的状态信息。msDebug工具通过对coredump文件的解析,即使在没有主动压测的情况下也能收集到足够的数据用于问题分析。通过上述功能,不仅提高了硬件异常问题的定位效率,还减少因反复压测给用户带来的不便。
配置acl.json文件后将不能使用msDebug的其他功能。
默认情况下,该文件名为core或core.pid(其中 pid 为进程ID)。
1 2 3 4 5 6 7 |
(py38) root@ubuntu:~/CLionProjects/untitled/build$ msdebug --core corefile //corefile 为用户coredump文件的路径 msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools. The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware. This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments. (msdebug) target create --core "/home/xx/coredump_file/GatherV3_9e31943a1a48bf81ddff1fc6379e0be3_high_performance_10330.2.1.20250217233735574.core" Core file '/home/xx/coredump_file/GatherV3_9e31943a1a48bf81ddff1fc6379e0be3_high_performance_10330.2.1.20250217233735574.core' (hiipu64) was loaded. [Switching to focus on CoreId 30, Type aiv] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
(msdebug) ascend info summary CoreId CoreType DeviceId ChipType 30 AIV 0 A2/A3 Id DataType MemType Addr Size dim 0 DEVICE_KERNEL_OBJECT GM 0x12c0c0014000 402984 1 STACK GM/DCACHE 0x12c1400f0000 32768 2 WORKSPACE_TENSOR GM 0x12c048e04400 76832 3 TILING_DATA GM/DCACHE 0x12c180000438 200 4 OUTPUT_TENSOR GM 0x12c048e00200 16384 [2, 2048] 5 INPUT_TENSOR GM 0x12c041200000 83886080 [10240, 2048] 6 INPUT_TENSOR GM 0x12c048e00000 32 [2] 7 INPUT_TENSOR GM 0x12c180000518 32 [1] 8 ARGS GM/DCACHE 0x12c180000400 312 (msdebug) re r PC PC = 0x000012c0c00157c4 (msdebug) x -m GM -f uint8_t 0x12c0c0014000 -s 32 -c 1 0x12c0c0014000: {0x80 0x7f 0x3a 0x07 0x10 0x00 0x7b 0x07 0x80 0x38 0x9e 0x02 0x81 0xd7 0x3b 0x00 0x80 0x08 0x1f 0x02 0xff 0x7f 0x20 0x07 0x01 0x00 0x00 0x07 0x0a 0xf8 0xde 0x00} (msdebug) x -m GM -f uint8_t 0x12c1400f0000 -s 32 -c 1 0x12c1400f0000: {0x00 0x70 0xe1 0x48 0xc0 0x12 0x00 0x00 0x00 0x70 0xe1 0x48 0xc0 0x12 0x00 0x00 0xc8 0x7a 0x1f 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00} (msdebug) x -m DCACHE -f uint8_t 0x12c1400f0000 -s 32 -c 1 0x12c1400f0000: {0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00} |
1 2 |
(msdebug) q Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y |