Checking the Ascend C Operator of the Kernel Launch Symbol

Procedure

  1. Prepare for the installation by referring to Kernel Launch Symbol.
  2. Configure environment variables by referring to Before You Start.
  3. Build a single-operator executable file.

    The following is an example of the command for building an Add operator executable file:

    bash run.sh -r npu -v <soc_version> 

    After the one-click script building and running is complete, the NPU-side executable file <kernel_name>_npu is generated in the project directory.

  4. Use msSanitizer to start the executable file of a single-operator (add_npu is used as an example).
    • Run the following command to check the memory. For details about the parameters, see Table 2 and Table 3. For details about how to check the memory, see Memory Check Example.
      mssanitizer --tool=memcheck ./ add_npu  # Specify --tool=memcheck for memory check.
    • Run the following command for contention detection. For details about the parameters, see Table 2. For details about contention detection, see Contention Detection Example.
      mssanitizer --tool=racecheck ./ add_npu # Specify --tool=racecheck for contention detection.

    The path of the single-operator executable file can be set to either an absolute path or a relative path according to the actual situation.

Memory Check Example

  • Before Step 1, construct an invalid read/write scenario in the Add operator and change the length of DataCopy from TILE_LENGTH to 2 × TILE_LENGTH. In this case, memory overwriting occurs during the last copy.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
      __aicore__ inline void CopyOut(int32_t progress)
        {
            // deque output tensor from VECOUT queue
            LocalTensor<half> zLocal = outQueueZ.DeQue<half>();
            // copy progress_th tile from local tensor to global tensor
           // Construct an invalid read/write scenario.
            DataCopy(zGm[progress * TILE_LENGTH], zLocal, 2 * TILE_LENGTH);
            // free output tensor for reuse
            outQueueZ.FreeTensor(zLocal);
        }
    
  • According to the report generated by the detection tool, 224-byte invalid write operations are performed on the GM in line 65 of the add_custom.cpp file, which corresponds to the constructed abnormal scenario.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    $ mssanitizer --tool=memcheck ./add_npu
    ====== ERROR: illegal write of size 224
    ======    at 0x12c0c002ef00 on GM
    ======    in block aiv(7)
    ======    code in pc current 0x1644 (serialNo:2342)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8
    

Contention Detection Example

  • Before Step 1, construct an inter-core contention scenario in the Add operator and change the length of DataCopy from TILE_LENGTH to 2 × TILE_LENGTH. In this case, inter-core contention exists in the GM.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
      __aicore__ inline void CopyOut(int32_t progress)
        {
            // deque output tensor from VECOUT queue
            LocalTensor<half> zLocal = outQueueZ.DeQue<half>();
            // copy progress_th tile from local tensor to global tensor
            // Construct an inter-core contention scenario.
            DataCopy(zGm[progress * TILE_LENGTH], zLocal, 2 * TILE_LENGTH);
            // free output tensor for reuse
            outQueueZ.FreeTensor(zLocal);
        }
    
  • According to the report generated by the detection tool, in line 65 of add_kernel.cpp, inter-core contention occurs between cores 0 and 1 of the AIV, corresponding to the constructed exception scenario.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    $ mssanitizer --tool=racecheck ./add_npu
    ====== ERROR: Potential WAW hazard detected at GM :
    ======    PIPE_MTE3 Write at WAW()+0x12c0c0025f00 in block 0 (aiv) at pc current 0x1644 (serialNo:305)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8
    ======    PIPE_MTE3 Write at WAW()+0x12c0c0026000 in block 1 (aiv) at pc current 0x1644 (serialNo:329)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8