Debugging the Operators Called by a PyTorch Interface

This section shows how to use msDebug to debug the add operator called by a PyTorch interface on the board. The add operator can add two vectors and output the result.

Prerequisites

Click here to obtain a sample project for operator debugging.
- This sample project supports only Python 3.9. To run it on other Python versions, change the Python version in the run_op_plugin.sh file in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/PytorchInvocation directory.
- This example project does not support Atlas A3 Training Series Product.
- When downloading the code sample, run the following command to specify the branch version:
```
git clone https://gitee.com/ascend/samples.git -b master
```
Install the PyTorch framework and torch_npu plugin by referring to Ascend Extension for PyTorch Software Installation Guide.
Configure environment variables by referring to Before You Start.

Procedure

Run the following command to generate a custom operator project and implement the operator on the host and kernel:
1
bash install.sh -v Ascendxxxyy # xxxyy indicates the processor type.

In the CMakePresets.json file under the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory, set the cacheVariables configuration item from "Release" to "Debug".

"cacheVariables": {               
       "CMAKE_BUILD_TYPE": {                    
           "type": "STRING",                    
           "value": "Debug"               
       },
...
}

Compile and deploy the operator by referring to Compiling and Deploying Operators.

Go to the sample directory and download the sample code in CLI mode. Call the AddCustom operator project in PyTorch mode by referring to README and complete the compilation as instructed.

The sample project directory is as follows:

PytorchInvocation
├── op_plugin_patch  
├── README.md        // Registration sample of calling the AddCustom operator project in PyTorch mode
├── run_op_plugin.sh      // Required for sample execution.
└── test_ops_custom.py    // Required for tool startup in step 7.
└── test_ops_custom_register_in_graph.py    // Executes the test case script in torch.compile mode.

cd ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/PytorchInvocation

Execute the sample. During the sample execution, test data is automatically generated. Run the PyTorch sample, and verify the running result.

bash run_op_plugin.sh
-- CMAKE_CCE_COMPILER: ${INSTALL_DIR}/toolkit/tools/ccec_compiler/bin/ccec
-- CMAKE_CURRENT_LIST_DIR: ${INSTALL_DIR}/AddKernelInvocation/cmake/Modules
-- ASCEND_PRODUCT_TYPE:
  Ascendxxxyy
-- ASCEND_CORE_TYPE:
  VectorCore
-- ASCEND_INSTALL_PATH:
  /usr/local/Ascend/cann
-- The CXX compiler identification is GNU 10.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: ${INSTALL_DIR}/AddKernelInvocation/build
Scanning dependencies of target add_npu
...
[100%] Built target add_npu
INFO: Ascend C Add Custom SUCCESS
...
INFO: Ascend C Add Custom  in torch.compile graph SUCCESS

Manually import the operator debugging information. The following is an example.
- Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
- Products except for Atlas A3 training products/Atlas A3 inference products: Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy. If Ascendxxxyy is the code sample path, you need to set ascendxxxyy.
- Atlas A3 training products/Atlas A3 inference products: Run the npu-smi info -t board -i id -c chip_id command on the server where the Ascend AI Processor is installed to obtain Chip Name and NPU Name. The actual value is Chip Name_NPU Name. For example, if the value of Chip Name is Ascendxxx and the value of NPU Name is 1234, the actual value is Ascendxxx_1234. If Ascendxxx_1234 is the code sample path, you need to set ascendxxx_1234.
  Note that:
  
  id: device ID, which is the NPU ID obtained by running the npu-smi info -l command.
  chip_id: chip ID, which is obtained by running the npu-smi info -m command.
```
export LAUNCH_KERNEL_PATH=${INSTALL_DIR}/opp/vendors/customize/op_impl/ai_core/tbe/kernel/SOC_VERSION/add_custom/AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o
```

Start msDebug to boot the Python program, and go to the debugging page.

msdebug python3 test_ops_custom.py
(msdebug) target create "python3"
Current executable set to '/home/mindstudio/miniconda3/envs/py39/bin/python3' (aarch64).
(msdebug) settings set -- target.run-args  "test_ops_custom.py"
(msdebug)

Set a breakpoint.

Set an NPU breakpoint in the kernel function based on the specified source code file and corresponding line number.

(msdebug) b add_custom.cpp:60
Breakpoint 1: where = AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) + 9912 [inlined] KernelAdd::Compute(int) + 3400 at add_custom.cpp:60:9, address = 0x00000000000026b8

Run the program and wait until the breakpoint is hit.

(msdebug) r
Process 197189 launched: '/home/miniconda3/envs/py39/bin/python3' (aarch64)
Process 197189 stopped and restarted: thread 1 received signal: SIGCHLD
...
[Launch of Kernel anonymous on Device 0]
Process 197189 stopped
[Switching to focus on Kernel anonymous, CoreId 8, Type aiv]
* thread #1, name = 'python3', stop reason = breakpoint 2.1
    frame #0: 0x00000000000026b8 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x000000000020efb8, progress=1) at add_custom.cpp:60:9
   57              LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y>();
   58              LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>();
   59              Add(zLocal, xLocal, yLocal, this->tileLength);
-> 60              outQueueZ.EnQue<DTYPE_Z>(zLocal);
   61              inQueueX.FreeTensor(xLocal);
   62              inQueueY.FreeTensor(yLocal);
   63          }
(msdebug)

For details about other debugging operations, see Importing Debugging Information, Printing Memory and Variables, Displaying the Debugging Information, and Switching Cores.

Delete a breakpoint. For details, see Deleting Breakpoints.

After the debugging is complete, run the q command and enter Y or y to end the debugging.

(msdebug) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

Parent topic: Typical Cases