Introduction

msLeaks is a memory leak detection tool based on the Ascend AI Processor to locate memory problems during model training and inference. msLeaks provides functions such as memory leak detection, memory comparison, memory block monitoring, memory decomposition, and identification of inefficient memory, helping users efficiently locate and handle problems.

Functions

Table 1 displays the functions supported by msLeaks.

Table 1 msLeaks functions

Function

Application Scenario and Description

Memory leak analysis

If the memory is not deallocated for a long time or a memory leak occurs, msLeaks provides memory leak analysis and change analysis at the kernel launch level to locate and analyze alarms.

Memory comparison analysis

If the memory usage differs between two steps, it may lead to excessive memory usage or even out of memory (OOM) errors. In this case, use the memory comparison analysis function of msLeaks to locate and analyze the problem.

Memory block monitoring

In foundation model scenarios, if it is difficult to locate memory corruption, msLeaks can monitor the specified memory blocks before and after operator execution through Python interfaces and command line interfaces (CLIs). Based on changes in the memory block data, it can quickly determine the scope or exact location of memory corruption between operators.

Memory decomposition

msLeaks provides the memory decomposition function for the CANN layer and Ascend Extension for PyTorch framework, outputting memory usage for components like model weights, activations, gradients, and optimizers.

Identification of inefficient memory

During model training and inference, some memory blocks may not be used immediately after being allocated or may not be deallocated in a timely manner after being used. msLeaks identifies the inefficient memory usage to optimize model training and inference.

Supported Frameworks

Currently, msLeaks supports memory leak detection of the following frameworks:

  • Ascend Extension for PyTorch 7.0.0 and later versions
  • MindSpore 2.7.0 and later versions
  • ATB operators of CANN 8.2.RC1 and later versions