Debugging the Operators Called by a PyTorch Interface

This section shows how to use msDebug to debug the add operator called by a PyTorch interface on the board. The add operator can add two vectors and output the result.

Prerequisites

  • Click Link to obtain a sample project for operator debugging.
    • This sample project supports only Python 3.9. To run it on other Python versions, change the Python version in the run_op_plugin.sh file in the ${install_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/PytorchInvocation directory.
    • When downloading the code sample, run the following command to specify the branch version:
      git clone https://gitee.com/ascend/samples.git -b v0.2-8.0.0.beta1
  • Install the PyTorch framework and torch_npu plugin by referring to Ascend Extension for PyTorch Configuration and Installation .
  • Configure environment variables by referring to Before You Start.

Procedure

  1. Run the following command to generate a custom operator project and provide the operator implementation on the host and kernel:
    bash install.sh -v Ascendxxxyy    # xxxyy indicates the processor type.
  2. In the CMakePresets.json file under the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory, set the cacheVariables configuration item from "Release" to "Debug".
    1
    2
    3
    4
    5
          "cacheVariables": {               
                 "CMAKE_BUILD_TYPE": {                    
                     "type": "STRING",                    
                     "value": "Debug"               
           },
    
  3. Compile and deploy the operator by referring to Compiling and Deploying Operators.
  4. Go to the sample directory and download the sample code in CLI mode. Call the AddCustom operator project in PyTorch mode by referring to README and complete the compilation as instructed.

    The sample project directory is as follows:

    1
    2
    3
    4
    5
    PytorchInvocation
    ├── op_plugin_patch         
    ├── run_op_plugin.sh      // Required for sample execution.
    ├── test_ops_custom_register_in_graph.py    // Executes the test case script in torch.compile mode.
    └── test_ops_custom.py    // Required for tool startup in step 7.
    
    cd ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/PytorchInvocation
  5. Execute the sample. During the sample execution, test data is automatically generated. Run the PyTorch sample, and verify the running result.
    bash run_op_plugin.sh
    -- CMAKE_CCE_COMPILER: ${INSTALL_DIR}/toolkit/tools/ccec_compiler/bin/ccec
    -- CMAKE_CURRENT_LIST_DIR: ${INSTALL_DIR}/AddKernelInvocation/cmake/Modules
    -- ASCEND_PRODUCT_TYPE:
      ascendxxxyy
    -- ASCEND_CORE_TYPE:
      VectorCore
    -- ASCEND_INSTALL_PATH:
      /usr/local/Ascend/ascend-toolkit/latest
    -- The CXX compiler identification is GNU 10.3.1
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++ - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Configuring done
    -- Generating done
    -- Build files have been written to: ${INSTALL_DIR}/AddKernelInvocation/build
    Scanning dependencies of target add_npu
    ...
    [100%] Built target add_npu
    INFO: Ascend C Add Custom SUCCESS
    ...
    INFO: Ascend C Add Custom  in torch.compile graph SUCCESS
  6. Manually import the operator debugging information.
    • Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.
    • Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy. If Ascendxxxyy is the code sample path, you need to set ascendxxxyy.
    • Note that:

      • id: device ID, which is the NPU ID obtained by running the npu-smi info -l command.
      • chip_id: chip ID, which is obtained by running the npu-smi info -m command.
    (msdebug)export LAUNCH_KERNEL_PATH=${INSTALL_DIR}/opp/vendors/customize/op_impl/ai_core/tbe/kernel/SOC_VERSION/add_custom/AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o
  7. Start msDebug to boot the Python program, and go to the debugging page.
    1
    2
    3
    4
    5
    msdebug python3 test_ops_custom.py
    (msdebug) target create "python3"
    Current executable set to '/home/mindstudio/miniconda3/envs/py37/bin/python3' (aarch64).
    (msdebug) settings set -- target.run-args  "test_ops_custom.py"
    (msdebug)
    
  8. Set a breakpoint.
    Set an NPU breakpoint in the kernel function based on the specified source code file and corresponding line number.
    1
    2
    (msdebug) b add_custom.cpp:60
    Breakpoint 1: where = AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) + 9912 [inlined] KernelAdd::Compute(int) + 3400 at add_custom.cpp:60:9, address = 0x00000000000026b8
    
  9. Run the program and wait until the breakpoint is hit.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    (msdebug) r
    Process 197189 launched: '/home/miniconda3/envs/py38/bin/python3' (aarch64)
    Process 197189 stopped and restarted: thread 1 received signal: SIGCHLD
    ...
    [Launch of Kernel anonymous on Device 0]
    Process 197189 stopped
    [Switching to focus on Kernel anonymous, CoreId 8, Type aiv]
    * thread #1, name = 'python3', stop reason = breakpoint 2.1
        frame #0: 0x00000000000026b8 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x000000000020efb8, progress=1) at add_custom.cpp:60:9
       57              LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y>();
       58              LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>();
       59              Add(zLocal, xLocal, yLocal, this->tileLength);
    -> 60              outQueueZ.EnQue<DTYPE_Z>(zLocal);
       61              inQueueX.FreeTensor(xLocal);
       62              inQueueY.FreeTensor(yLocal);
       63          }
    (msdebug)
    
  10. Delete a breakpoint. For details, see Deleting Breakpoints.
  11. After the debugging is complete, run the q command and enter Y or y to end the debugging.
    1
    2
    (msdebug) q
    Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y