Checking the Memory of the CANN Software Stack
For scenarios where memory exceptions may occur when user programs call CANN software stack APIs, the msSanitizer tool provides the memory check capability for device-related APIs and AscendCL-related APIs, allowing users to locate memory exceptions.
Memory Leak Detection Principles
When the device memory queried by running the npu-smi info command keeps increasing, you can use this tool to locate memory leak. If memory leak occurs on AscendCL series APIs, you can locate the code line.
To locate a memory leak, perform the following steps:
- Enable leak check for device series APIs to determine whether memory leak occurs on the host. If no, the leak occurs on the device. If yes, go to the next step to check whether AscendCL API call leak occurs.
- Enable leak check for AscendCL series APIs to determine whether leak occurs when user code calls AscendCL APIs. If no, the problem is not caused by AscendCL API calls. If yes, go to the next step to locate the specific code line.
- Use the new APIs provided by the msSanitizer detection tool to recompile the header file, and then use the detection tool to start the detection program to locate the file name and code line number corresponding to the allocation function that does not free the memory. For details about the new APIs, see External API Usage Description.
Troubleshooting Procedure
- Configure related environment variables by referring to Before You Start.
- Check whether memory leak occurs on the host.
- Use the msSanitizer tool to start the program to be checked. An example command is as follows:
mssanitizer --check-device-heap=yes --leak-check=yes ./add_npu
The path of the program to be checked (for example, add_custom_npu) can be set to either an absolute path or a relative path according to the actual situation.
- If no error information is displayed, the check program is running properly and no memory leak occurs on the host. If the following error information is displayed, memory leak occurs on the host.The following command output indicates that one memory allocation on the host is not deallocated, resulting in a 32800-byte memory leak.
1 2 3 4 5 6
====== ERROR: LeakCheck: detected memory leaks ====== Direct leak of 32800 byte(s) ====== at 0x124080024000 on GM allocated in <unknown>:0 (serialNo:0) ====== SUMMARY: 32800 byte(s) leaked in 1 allocation(s)
- Use the msSanitizer tool to start the program to be checked. An example command is as follows:
- Check whether the leak is caused by AscendCL API calls.
- Use the msSanitizer tool to start the program to be checked. An example command is as follows:
mssanitizer --check-cann-heap=yes --leak-check=yes ./add_npu
- If no exception information is displayed, the check program is running successfully and no memory leak occurs during the AscendCL API calls. If the following error information is displayed, memory leak occurs during the AscendCL API calls.The following information indicates that one memory allocation is not deallocated when the AscendCL API is called. As a result, 32768-byte memory is leaked.
1 2 3 4 5 6
====== ERROR: LeakCheck: detected memory leaks ====== Direct leak of 32768 byte(s) ====== at 0x124080024000 on GM allocated in <unknown>:0 (serialNo:0) ====== SUMMARY: 32768 byte(s) leaked in 1 allocation(s)
- Use the msSanitizer tool to start the program to be checked. An example command is as follows:
- If memory leak occurs, use the msSanitizer API header file acl.h and the corresponding dynamic library file provided by the msSanitizer tool to locate the code file and code line where memory leak occurs.
When locating the code file and code line where the leak occurs, replace the original acl/acl.h header file in the user code with the msSanitizer API header file acl.h provided by the tool, link the libascend_acl_hook.so file to your application project and recompile the application project. For details, see Importing API Header Files and Linking Dynamic Libraries.
- Use the msSanitizer tool to restart the program. The following is a command example:
mssanitizer --check-cann-heap=yes --leak-check=yes ./add_npu
The following information indicates that the one memory allocation is not deallocated in line 55 of the main.cpp file of the application. Then you can locate the cause of the memory leak.
1 2 3 4 5 6
====== ERROR: LeakCheck: detected memory leaks ====== Direct leak of 32768 byte(s) ====== at 0x124080024000 on GM allocated in main.cpp:55 (serialNo:0) ====== SUMMARY: 32768 byte(s) leaked in 1 allocation(s)
Importing API Header Files and Linking Dynamic Libraries
The following uses the kernel launch symbol scenario of the
- Click link to obtain the sample project for verifying the code.
When downloading the code sample, run the following command to specify the branch version:git clone https://gitee.com/ascend/samples.git -b master - In the ${git_clone_path}/samples/operator/ascendc/0_introduction/3_add_kernellaunch/AddKernelInvocationNeo directory, replace the acl/acl.h header file introduced by the main.cpp file with the acl.h header file provided by msSanitizer.
In the template library scenario, you need to replace #include <acl/acl.h> in the /examples/common/helper.hpp path of the Ascend C template library with #include "acl.h". The procedure is as follows:
#include "data_utils.h" #ifndef ASCENDC_CPU_DEBUG // #include "acl/acl.h" // Replace acl/acl.h with acl.h. #include "acl.h" extern void add_custom_do(uint32_t blockDim, void *stream, uint8_t *x, uint8_t *y, uint8_t *z); #else
- Edit the CMakeLists.txt file in the ${git_clone_path}/samples/operator/ascendc/0_introduction/3_add_kernellaunch/AddKernelInvocationNeo directory and import the API header file path ${git_clone_path}/tools/mssanitizer/include/acl and dynamic library path${INSTALL_DIR}/tools/mssanitizer/lib64/libascend_acl_hook.so.
- The template library scenario applies only to the
Atlas A2 training products /Atlas A2 inference products . - In the template library scenario, run the following commands to add compilation check options:
-I$ENV{ASCEND_HOME_PATH}/tools/mssanitizer/include/acl -L$ENV{ASCEND_HOME_PATH}/tools/mssanitizer/lib64 -lascend_acl_hook
add_executable(ascendc_kernels_bbit ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp) target_compile_options(ascendc_kernels_bbit PRIVATE $<BUILD_INTERFACE:$<$<STREQUAL:${RUN_MODE},cpu>:-g>> -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -Wall -Werror ) # Import the path of the API header file during the compilation of the operator executable file. target_include_directories(ascendc_kernels_bbit PUBLIC $ENV{ASCEND_HOME_PATH}/tools/mssanitizer/include/acl) # Import the path of the libascend_acl_hook.so dynamic library when the operator executable file is linked. target_link_directories(ascendc_kernels_bbit PRIVATE $ENV{ASCEND_HOME_PATH}/tools/mssanitizer/lib64) target_link_libraries(ascendc_kernels_bbit PRIVATE $<BUILD_INTERFACE:$<$<OR:$<STREQUAL:${RUN_MODE},npu>,$<STREQUAL:${RUN_MODE},sim>>:host_intf_pub>> $<BUILD_INTERFACE:$<$<STREQUAL:${RUN_MODE},cpu>:ascendcl>> ascendc_kernels_${RUN_MODE} # Link the operator executable file to the libascend_acl_hook.so dynamic library. ascend_acl_hook ) - The template library scenario applies only to the
- Import environment variables and recompile the operator.
Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy.
export LD_LIBRARY_PATH=${ASCEND_HOME_PATH}/tools/mssanitizer/lib64:$LD_LIBRARY_PATH mssanitizer --check-cann-heap=yes --leak-check=yes -- bash run.sh -r npu -v Ascendxxxyy
