Identification of Inefficient Memory

During model training and inference, some memory blocks may not be used immediately after being allocated or may not be deallocated in a timely manner after being used. As a result, the memory usage increases, leading to inefficient memory.

Inefficient memory means a tensor whose memory is allocated, deallocated, or accessed on the device at inappropriate times during model runtime.

The msLeaks tool can identify three kinds of inefficient memory at the op operator level: early allocation, late deallocation, and temporary idleness. Details are shown as Table 1.

Table 1 Inefficient memory description

Category

Description

Early allocation

A tensor object is considered to be allocated early if other tensor objects are deallocated between its memory allocation operator and its first access operator.

Late deallocation

A tensor object is considered to be deallocated late if other tensor objects are allocated between its last access operator and its memory deallocation operator.

Temporary idleness

A tensor object is considered temporarily idle if the number of operators between any two of its memory access operations exceeds a given threshold.

Online Mode

Run the following command to enable the inefficient memory identification function. Application is the user script.

msleaks ${Application} --analysis=inefficient --events=alloc,free,access,launch
  • When using the inefficient memory identification function, set --events=access,alloc,free,launch.
  • The msLeaks tool can identify only inefficient memory in ATB LLM and Ascend Extension for PyTorch single-operator scenarios.

Offline Mode

You can customize the identification of inefficient memory using interfaces. For details, see API Reference.

Result Description

The results of inefficient memory identification are saved in the leaks_dump_{timestamp}.csv file. For details, see Output Description.