Specifying a Device ID (MC2 Fused Operator Scenario)

When debugging a single-process multi-thread MC2 fused operator, you can run the ascend device ID command (ID indicates the device ID) to specify the device ID to debug the operator on a specific device. This debugging mode has the following advantages:

  • Higher debugging efficiency: By selecting a specific device, you can use hardware resources more efficiently and accelerate the debugging process.
  • Well targeted: You can debug a specific device to detect and resolve performance bottlenecks or compatibility issues related to the device.
  • Issue isolation: If a performance or function issue occurs, you can specify different device IDs to check whether the issue is caused by a specific device, thereby making it easier to locate the issue.
  • If no device ID is specified, only the device ID set for the first time during program running is debugged.
  • The HCCL APIs do not support step-by-step debugging. For details about the APIs, see HCCL Kernel APIs"High-level APIs" > "HCCL" > "HCCL Kernel APIs" in .
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
py38) [root@localhost MC2-master]# msdebug /home/xxx/MC2-master/bin/alltoall_custom_aarch64
msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
(msdebug) target create "/home/xxx/MC2-master/bin/alltoall_custom_aarch64"
Current executable set to '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64).
(msdebug) b all_to_all_custom_v3.cpp:58
Breakpoint 1: 2 locations.
(msdebug) ascend device 1
(msdebug) run --x1_shape 72,17 --input_tensor_format ND --input_tensor_dtype fp16 --output_shape 72,17 --output_dtype fp16 --output_format ND --n_dev 2 --bin_path feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010 --loop_cnt 1 --platform 1971 --version 3 --tileM 128 | tee /home/shelltest/MC2-master/feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010/mc2_memory.log
Process 2625643 launched: '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64)
[INFO] rank 0 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaac9e14610, context : 0xaaaac9daeda0
[INFO] rank 1 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaaca8c8380, context : 0xaaaaca88f280
 before RunGraph : free :29837 M,  total:30196 M, used :358 M, ret :0
 before RunGraph : free :29835 M,  total:30196 M, used :360 M, ret :0
Process 2625643 stopped and restarted: thread 19 received signal: SIGCHLD
[INFO]  M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[INFO]  M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
Process 2625643 stopped
[Switching to focus on Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000, CoreId 0, Type aiv]
* thread #1, name = 'alltoall_custom', stop reason = breakpoint 1.2
    frame #0: 0x0000000000004e0c AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf.o`all_to_all_custom_v3_1000_tilingkey.vector(aGM="\x8b2d3+\xb5Ӫ\xbe\xb7\xa94\x87\xba;\xb6\xf68\U0000000e9\xc1\xa9", cGM="", workspaceGM="", tilingGM="d") at all_to_all_custom_v3.cpp:58:28
   55       auto &&cfg = tilingData.param;
   56       const uint8_t tileNum = cfg.tileNum;
   57       const uint8_t tailNum = cfg.tailNum;
-> 58       const uint64_t tileM = cfg.tileM;
   59       const uint64_t tailM = cfg.tailM;
   60       const uint64_t M = cfg.M;
   61       const uint64_t K = cfg.K;