Overview

MindStudio Sanitizer (msSanitizer) is a tool based on Ascend AI Processor. It provides memory check, contention check, uninitialization check, and synchronization check in single-operator development scenarios. After testing the operator functions in a real-world hardware environment by using the msOpST tool, you can determine whether to use the msSanitizer tool for exception detection based on the actual test situation.

  • Memory check: During operator development, the tool can locate memory problems such as illegal read/write, multi-core corruption, non-aligned access, memory leak, and illegal release. In addition, the tool can detect the memory of the CANN software stack, helping users locate the module with memory exception in the software stack.
  • Contention check: The tool helps users locate data contention problems that may be caused by contention risks, including intra-core contention and inter-core contention. Intra-core contention includes inter-pipeline contention and intra-pipeline contention.
  • Uninitialization check: The tool helps users locate dirty data read problems that may be caused by uninitialized memory.
  • Synchronization check: The tool helps users locate synchronization failures in subsequent operators due to unpaired synchronization instructions in the preceding operators.

msSanitizer does not support detection of multi-thread operators and vector instructions that use masks.

Features

msSanitizer provides different error detection functions. The following functions are supported:

Table 1 msSanitizer functions

Use Case

Example

Sample

Operator memory check

Memory Check

msSanitizer supports memory and contention check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by kernel launch symbols. For details, see Memory Check.

Operator contention check

Contention Check

msSanitizer supports memory and contention check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by single-operator APIs. For details, see Contention Check.

Operator uninitialization check

Uninitialization Check

msSanitizer supports uninitialization check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by AscendCL. For details, see Uninitialization Check.

Synchronization check

Synchronization Check

msSanitizer supports the synchronization instruction pairing check of Ascend C operators (including Vector, Cube, and Mix fusion operators) called by kernel launch symbols or single-operator APIs. For details, see Synchronization Check.

Memory check of the CANN software stack

Memory Check

For details, see Checking the Memory of the CANN Software Stack.

Commands

You can run the following command to call msSanitizer:

mssanitizer <options> -- <user_program> <user_options>   
  • options is the command line option of the detection tool. For details about the available options and their default values, see Table 2 and Table 3. user_program indicates the user operator program, and user_options indicates the command line option of the user program.
  • If the executable file or user-defined program to be loaded contains command line options, use -- to separate the detection tool from the user command before the executable file or applications.
    mssanitizer -- application parameter1 parameter2 ...
  • You need to ensure the security of executable files and user-defined applications.
  • You need to ensure the execution security of executable files or applications.
    • You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
    • You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
Table 2 Common parameters

Parameter

Description

Value

Mandatory (Yes/No)

-v, --version

Queries the msSanitizer version.

-

No

-t, --tool

Specifies the sub-tool for exception check.

  • memcheck: memory check (default)
  • racecheck: contention check
  • initcheck: uninitialization check
  • synccheck: synchronization check

No

--log-file

Specifies that the check report is exported to a file.

{file_name}, for example, test_log

NOTE:
  • Only digits, uppercase letters, lowercase letters, hyphens (-), periods (.), slashes (/), and underscores (_) are supported.
  • To prevent log leakage, you are advised to restrict the file permission to ensure that only authorized personnel can access the file.
  • The tool exports the report to the test_log file in overwriting mode. If the test_log file contains content, the content will be cleared. You are advised to specify an empty file for exporting the report.

No

--log-level

Specifies the output level of the check report.

  • info: outputs running information at the info, warn, and error levels.
  • warn: outputs running information at the warn and error levels (default).
  • error: outputs running information at the error level.

No

--max-debuglog-size

Specifies the maximum size of a single file in the debugging logs output by the check tool.

The value is an integer ranging from 1 to 10240, in MB.

Defaults to 1024.

NOTE:

--max-debuglog-size=100 indicates that the maximum size of a debug log file is 100 MB.

No

--block-id

Enables the single-block check function or not.

The value is an integer ranging from 0 to 200.

Disabled

  • Memory check, uninitialized check, and synchronization check: all blocks are checked by default.
  • Contention check: By default, inter-core check covers all blocks, while intra-core check covers contention within and between the pipelines of block 0.

Enabled

  • Memory check, uninitialized check, and synchronization check: Specified blocks are checked.
  • Contention check: Inter-core check is not performed. The contention within and between the pipelines of a specified block is checked.

No

--cache-size

Specifies the GM size of a single block.

The value for a single block is an integer ranging from 1 to 8192, in MB.

The default value for a single block is 100 MB, indicating that a single block can be allocated with 100 MB of memory.

NOTE:
  • When single-block check is enabled, the maximum value of --cache-size is 8192 MB. If single-block check is disabled, the maximum value of --cache-size is (24 × 1024/Number of blocks).
  • If the value of --cache-size does not meet the requirement, the exception check tool prints a message prompting you to reset --cache-size. For details, see mssanitizer Error: --cache-size Exception.

No

--kernel-name

Specifies the name of the operator to be checked.

Partial strings within operator names can be used for fuzzy matching. If this parameter is not specified, the system checks all operators scheduled during program execution by default.

For example, to check operators named abcd and bcd at the same time, you can configure --kernel-name="bc". The system automatically identifies and detects all operators that contain the string bc.

No

-h, --help

Outputs the help information.

-

No

Table 3 Memory check options

Parameter

Description

Value

Required

--check-unused-memory

Whether to detect unused allocated memory

  • yes
  • no (default)

No

--leak-check

Whether to enable memory leak detection

  • yes
  • no (default)

No

--check-device-heap

Whether to enable device memory check

  • yes
  • no (default)

No

--check-cann-heap

Whether to enable memory check of the CANN software stack

  • yes
  • no (default)

No

  • Enabling --check-device-heap or --check-cann-heap disables checks within the kernel.
  • The device memory check and CANN software stack memory check cannot be enabled at the same time. If they are enabled at the same time, the message "CANNOT enable both --check-cann-heap and --check-device-heap" is displayed.
  • Programs to be checked that are recompiled by using the API header file provided by msSanitizer can be used only for leakage check of AscendCL series APIs, but not for device APIs.

Principles for Enabling the Exception Check Function

The exception check tool provides memory check (memcheck), contention check (racecheck), uninitialization check (initcheck), and synchronization check (synccheck). Multiple check functions can be combined and enabled based on the following principles:

  • You can specify the --tool parameter for multiple times to enable multiple detection functions. For example, you can run the following command to enable memory check and contention check at the same time:
    mssanitizer -t memcheck -t racecheck ./application
  • If the sub-option corresponding to the detection function is enabled, the corresponding detection function is also enabled by default. If the leak detection sub-option corresponding to memory detection is enabled, the memory detection function is automatically enabled.
    mssanitizer -t racecheck --leak-check=yes ./application
    The preceding command is equivalent to:
    mssanitizer -t racecheck -t memcheck --leak-check=yes ./application
  • If no detection function is specified, memory detection is enabled by default.
    mssanitizer ./application

    The preceding command is equivalent to:

    mssanitizer -t memcheck ./application

Call Scenarios

The following operator call scenarios are supported:

  • Kernel launch operator development: kernel launch
    • For details about kernel launch, see "Kernel Launch" For details about the procedure, see Checking the Ascend C Operator of the Kernel Launch Symbol.
    • When the custom operators are launched by <<<>>> method and integrated by pytorch, the GM is managed in memory pool mode by default, which may cause inaccurate out-of-bounds check results. Therefore, you need to set the following environment variable to disable the memory pool before the check to obtain more accurate check results:
      export PYTORCH_NO_NPU_MEMORY_CACHING=1
  • Project-based operator development: single-operator API calling
    • For details about single-operator API calling, see Single-Operator API Calling. For details, see Checking the Single-Operator via APIs.
    • When calling an API with the aclnn prefix, run the following command to pass the acl.json file through the aclInit API to ensure the accuracy of memory check:
      auto ret = aclInit("./acl.json"); // The content of acl.json file is {"dump":{"dump_scene":"lite_exception"}}.
  • AI framework operator adaptation: PyTorch framework
    • In PyTorch graph mode (TorchAir), the check can be performed only when no compilation option is added to msSanitizer. For details, see (Optional) Configuring a Compilation Option.
    • In PyTorch graph mode (TorchAir), the Ascend IR and aclgraph graph execution modes are supported. For details, see "reduce-overhead Mode" > "Configuring the reduce-overhead Mode" in Ascend Extension for PyTorch Supported Suites and Third-Party Libraries.
    • For details about the calling scenarios of the PyTorch framework, see "OpPlugin-based Operator Adaptation" in Ascend Extension for PyTorch Feature Guide. For details, see Checking the Operators Called by a PyTorch API.
  • Triton operator development: Triton operator calling
    • For details about the calling scenarios of the Triton operator, see Checking the Triton Operator.
    • The Triton and Triton-Ascend plug-in have been installed and configured. For details, see link.
    • The Triton operator calling scenario does not apply to Atlas inference products.
    • To prevent the impact of operators that are not recompiled, you are advised to enable the following environment variable:
      export TRITON_ALWAYS_COMPILE=1
    • In a Triton scenario, PyTorch is used to create tensors. In the PyTorch framework, the GM is managed in memory pool mode by default, which interferes with memory check. Therefore, you need to set the following environment variable to disable the memory pool before the check to ensure that the check result is accurate:
      1
      export PYTORCH_NO_NPU_MEMORY_CACHING=1
      

Result File Description

Result File

Description

mssanitizer_{TIMESTAMP}_{PID}.log

msSanitizer logs generated in the mindstudio_sanitizer_log directory during msSanitizer running. TIMESTAMP indicates the current timestamp, and PID indicates the PID of the current check tool.

kernel.{PID}.o

During the running of the msSanitizer tool, the operator cache files are generated in the current path. PID indicates the PID of the current check tool. The operator cache files are used to parse the abnormal call stack.

  • In normal cases, the operator cache file is automatically cleared when msSanitizer exits.
  • When msSanitizer exits abnormally (for example, terminated by pressing CTRL+C), the operator cache files are retained in the file system. The operator cache file contains the debugging information of the operator. You are advised to restrict the access permission of other users to the file, and delete the file as soon as possible after the detection tool is executed.

tmp_{PID}_{TIMESTAMP}

During the running of the msSanitizer tool, a temporary folder is generated in the current path. PID indicates the PID of the current check tool, and TIMESTAMP indicates the current timestamp. This folder is used to generate the operator kernel binary file.

  • In normal cases, the folder is automatically cleared when the msSanitizer exits.
  • When the environment variable export INJ_LOG_LEVEL=0 is used to enable the debug log function or the tool exits abnormally (for example, the tool is stopped by pressing Ctrl+C), the folder is retained in the file system for debugging. This folder contains the operator debugging information. You are advised to restrict the access permission of other users to the folder, and delete it as soon as possible after the debugging is complete.