Overview
MindStudio Sanitizer (msSanitizer) is a tool based on Ascend AI Processor. It provides memory check, contention check, uninitialization check, and synchronization check in single-operator development scenarios. After testing the operator functions in a real-world hardware environment by using the msOpST tool, you can determine whether to use the msSanitizer tool for exception detection based on the actual test situation.
- Memory check: During operator development, the tool can locate memory problems such as illegal read/write, multi-core corruption, non-aligned access, memory leak, and illegal release. In addition, the tool can detect the memory of the CANN software stack, helping users locate the module with memory exception in the software stack.
- Contention check: The tool helps users locate data contention problems that may be caused by contention risks, including intra-core contention and inter-core contention. Intra-core contention includes inter-pipeline contention and intra-pipeline contention.
- Uninitialization check: The tool helps users locate dirty data read problems that may be caused by uninitialized memory.
- Synchronization check: The tool helps users locate synchronization failures in subsequent operators due to unpaired synchronization instructions in the preceding operators.
msSanitizer does not support detection of multi-thread operators and vector instructions that use masks.
Features
msSanitizer provides different error detection functions. The following functions are supported:
Use Case |
Example |
Sample |
|---|---|---|
Operator memory check |
msSanitizer supports memory and contention check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by kernel launch symbols. For details, see Memory Check. |
|
Operator contention check |
msSanitizer supports memory and contention check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by single-operator APIs. For details, see Contention Check. |
|
Operator uninitialization check |
msSanitizer supports uninitialization check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by AscendCL. For details, see Uninitialization Check. |
|
Synchronization check |
msSanitizer supports the synchronization instruction pairing check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by kernel launch symbols or single-operator APIs. For details, see Synchronization Check. |
|
Memory check of the CANN software stack |
For details, see Checking the Memory of the CANN Software Stack. |
Commands
You can run the following command to call msSanitizer:
mssanitizer <options> -- <user_program> <user_options>
- options is the command line option of the detection tool. For details about the available options and their default values, see Table 2 and Table 3. user_program indicates the user operator program, and user_options indicates the command line option of the user program.
- If the executable file or user-defined program to be loaded contains command line options, use -- to separate the detection tool from the user command before the executable file or applications.
mssanitizer -- application parameter1 parameter2 ...
- You need to ensure the security of executable files and user-defined applications.
- You need to ensure the execution security of executable files or applications.
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
Parameter |
Description |
Value |
Mandatory (Yes/No) |
|---|---|---|---|
-v, --version |
Queries the msSanitizer version. |
- |
No |
-t, --tool |
Specifies the sub-tool for exception check. |
|
No |
--log-file |
Specifies that the check report is exported to a file. |
{file_name}, for example, test_log NOTE:
|
No |
--log-level |
Specifies the output level of the check report. |
|
No |
--max-debuglog-size |
Specifies the maximum size of a single file in the debugging logs output by the check tool. |
The value is an integer ranging from 1 to 10240, in MB. Defaults to 1024. NOTE:
--max-debuglog-size=100 indicates that the maximum size of a debug log file is 100 MB. |
No |
--block-id |
Enables the single-block check function or not. |
The value is an integer ranging from 0 to 200. Disabled
Enabled
|
No |
--cache-size |
Specifies the GM size of a single block. |
The value for a single block is an integer ranging from 1 to 8192, in MB. The default value for a single block is 100 MB, indicating that a single block can be allocated with 100 MB of memory. NOTE:
|
No |
--kernel-name |
Specifies the name of the operator to be checked. |
Partial strings within operator names can be used for fuzzy matching. If this parameter is not specified, the system checks all operators scheduled during program execution by default. For example, to check operators named abcd and bcd at the same time, you can configure --kernel-name="bc". The system automatically identifies and detects all operators that contain the string bc. |
No |
-h, --help |
Outputs the help information. |
- |
No |
Parameter |
Description |
Value |
Required |
|---|---|---|---|
--check-unused-memory |
Whether to detect unused allocated memory |
|
No |
--leak-check |
Whether to enable memory leak detection |
|
No |
--check-device-heap |
Whether to enable device memory check |
|
No |
--check-cann-heap |
Whether to enable memory check of the CANN software stack |
|
No |
- Enabling --check-device-heap or --check-cann-heap disables checks within the kernel.
- The device memory check and CANN software stack memory check cannot be enabled at the same time. If they are enabled at the same time, the message "CANNOT enable both --check-cann-heap and --check-device-heap" is displayed.
- Programs to be checked that are recompiled by using the API header file provided by msSanitizer can be used only for leakage check of AscendCL series APIs, but not for device APIs.
Principles for Enabling the Exception Check Function
The exception check tool provides memory check (memcheck), contention check (racecheck), uninitialization check (initcheck), and synchronization check (synccheck). Multiple check functions can be combined and enabled based on the following principles:
- You can specify the --tool parameter for multiple times to enable multiple detection functions. For example, you can run the following command to enable memory check and contention check at the same time:
mssanitizer -t memcheck -t racecheck ./application
- If the sub-option corresponding to the detection function is enabled, the corresponding detection function is also enabled by default. If the leak detection sub-option corresponding to memory detection is enabled, the memory detection function is automatically enabled.
mssanitizer -t racecheck --leak-check=yes ./application
The preceding command is equivalent to:mssanitizer -t racecheck -t memcheck --leak-check=yes ./application
- If no detection function is specified, memory detection is enabled by default.
mssanitizer ./application
The preceding command is equivalent to:
mssanitizer -t memcheck ./application
Call Scenarios
The following operator call scenarios are supported:
- Kernel launch operator development: kernel launch
- For details about kernel launch, see "Kernel Launch" For details about the procedure, see Checking the Ascend C Operator of the Kernel Launch Symbol.
- When the custom operators are launched by <<<>>> method and integrated by pytorch, the GM is managed in memory pool mode by default, which may cause inaccurate out-of-bounds check results. Therefore, you need to set the following environment variable to disable the memory pool before the check to obtain more accurate check results:
export PYTORCH_NO_NPU_MEMORY_CACHING=1
- Project-based operator development: single-operator API calling
- For details about single-operator API calling, see Single-Operator API Calling. For details, see Checking the Single-Operator via APIs.
- When calling an API with the aclnn prefix, run the following command to pass the acl.json file through the aclInit API to ensure the accuracy of memory check:
auto ret = aclInit("./acl.json"); // The content of acl.json file is {"dump":{"dump_scene":"lite_exception"}}.
- AI framework operator adaptation: PyTorch framework
- In PyTorch graph mode (TorchAir), the check can be performed only when no compilation option is added to msSanitizer. For details, see (Optional) Configuring a Compilation Option.
- In PyTorch graph mode (TorchAir), the Ascend IR and aclgraph graph execution modes are supported. For details, see "reduce-overhead Mode" > "Configuring the reduce-overhead Mode" in Ascend Extension for PyTorch Supported Suites and Third-Party Libraries.
- For details about the calling scenarios of the PyTorch framework, see "OpPlugin-based Operator Adaptation" in Ascend Extension for PyTorch Feature Guide. For details, see Checking the Operators Called by a PyTorch API.
- Triton operator development: Triton operator calling
- For details about the calling scenarios of the Triton operator, see Checking the Triton Operator.
- The Triton and Triton-Ascend plug-in have been installed and configured. For details, see link.
- The Triton operator calling scenario does not apply to
Atlas inference products . - To prevent the impact of operators that are not recompiled, you are advised to enable the following environment variable:
export TRITON_ALWAYS_COMPILE=1
- In a Triton scenario, PyTorch is used to create tensors. In the PyTorch framework, the GM is managed in memory pool mode by default, which interferes with memory check. Therefore, you need to set the following environment variable to disable the memory pool before the check to ensure that the check result is accurate:
1export PYTORCH_NO_NPU_MEMORY_CACHING=1
Result File Description
Result File |
Description |
|---|---|
mssanitizer_{TIMESTAMP}_{PID}.log |
msSanitizer logs generated in the mindstudio_sanitizer_log directory during msSanitizer running. TIMESTAMP indicates the current timestamp, and PID indicates the PID of the current check tool. |
kernel.{PID}.o |
During the running of the msSanitizer tool, the operator cache files are generated in the current path. PID indicates the PID of the current check tool. The operator cache files are used to parse the abnormal call stack.
|
tmp_{PID}_{TIMESTAMP} |
During the running of the msSanitizer tool, a temporary folder is generated in the current path. PID indicates the PID of the current check tool, and TIMESTAMP indicates the current timestamp. This folder is used to generate the operator kernel binary file.
|