Collection with Command Line Interfaces
- Method 1 (recommended): user.sh is the user script.
msleaks [options] bash user.sh
- Method 2
msleaks [options] -- <prog_name> [prog_options]
- You can configure the environment variable TASK_QUEUE_ENABLE as required. For details, see "TASK_QUEUE_ENABLE" in Ascend Extension for PyTorch Environment Variable Reference.
When TASK_QUEUE_ENABLE is set to 2, the level 2 optimization of the task_queue operator dispatch queue is enabled. At this time, workspace memory will be collected.
- msLeaks can collect memory data in open-form environments. It collects memory allocation and deallocation events of the HAL type. For details about how to install msLeaks in the open-form environment, see CANN Software Installation Guide (Open-Form).
- When you run the msLeaks tool as the user root, the system skips file permission verification by printing a message, which poses security risks. You are advised to run the msLeaks tool as a common user.
Parameter |
Description |
|---|---|
options |
CLI parameters. For details, see Table 2. |
prog_name |
User script name. Ensure the security of the custom script. This parameter is not required when the memory comparison function is enabled. |
prog_options |
User script parameter. Ensure the security of the custom script parameter. This parameter is not required when the memory comparison function is enabled. |
Category |
Parameter |
Description |
Required (Yes/No) |
|---|---|---|---|
Common options |
--help, -h |
Output the msLeaks help information. |
No |
--version, -v |
Output the msLeaks version information. |
No |
|
--steps |
Select the step ID of memory information to be collected. The values must be integers within the actual step range. You can configure one or more step IDs, with a maximum of 5 currently supported. The input step IDs are separated by a full-width or half-width comma (,). If this parameter is not set, the memory information of all steps is collected by default. Example: --steps=1,2,3. |
No |
|
--device |
Collect device information. The options are npu and npu:{id}. The default value is npu. The value cannot be empty. You can select multiple values at the same time. Use a full-width or half-width comma (,) to separate the values. Example: --device=npu. If the value contains both npu and npu:{id}, the memory information of all NPUs is collected by default, and npu:{id} does not take effect.
|
No |
|
--level |
Collect operator information. The options are 0 and 1, with 0 by default. Example: --level=0.
|
No |
|
--events |
Collect events. The options are alloc, free, launch, and access, with alloc, free, or launch by default. The values are separated by a full-width or half-width comma (,). Example: --events=alloc,free,launch.
Note that when --events=alloc is set, free is added by default. The actual collection items are alloc and free. When --events=free is set, alloc is added by default. The actual collection items are alloc and free. When --events=access is set, alloc and free are added by default. The actual collection items are access, alloc, and free. |
No |
|
--call-stack |
Collect call stacks. The options are python and c. You can select both of them and separate them with a full-width or half-width comma (,). You can set the call stack collection depth. Enter a number after the option. The option and the number are separated by a colon (:), indicating the collection depth. The value range is [0, 1000]. The default value is 50. Example: --call-stack=python, --call-stack=c:20,python:10.
|
No |
|
--collect-mode |
Memory collection mode. The options are immediate and deferred, with immediate by default. Only one value can be selected. Example: --collect-mode=immediate.
|
No |
|
--analysis |
Enable the related memory analysis function. The default value is leaks. If the value of --analysis is empty, no analysis function is enabled. You can select multiple values and separate them with a full-width or half-width comma (,). Example: --analysis=leaks,decompose.
Note that when --analysis=leaks or --analysis=decompose, alloc and free of --events are enabled by default, that is, --events=alloc,free. |
No |
|
--data-format |
Output file formats. The options are db and csv. Select a format as required. The value cannot be empty, with csv by default. Example: --data-format=db. If the output file is in db format, you can use the MindStudio Insight tool to display the file. For details, see "Memory Tuning" in MindStudio Insight User Guide.
|
No |
|
--watch |
Monitor memory blocks. The options are start, out{id}, end, and full-content. end is mandatory. You can select multiple options, which are separated by a full-width or half-width comma (,). The parameter setting format is --watch=start:out{id},end,full-content. Example: --watch=op0,op1,full-content.
|
No |
|
--output |
Specify the dump path of the output. The maximum length of the path is 4,096 characters. The default dump directory is leaksDumpResults. Example: --output=/home/projects/output. |
No |
|
--log-level |
Specify the level of the output log. The value can be info, warn, or error, with warn by default. |
No |
|
Memory comparison parameters |
--compare |
Enable the memory data comparison function between steps. |
No |
--input |
Absolute directory of the comparison files. You need to enter the directories of the baseline file and comparison file and separate them with a full-width or half-width comma (,). This parameter is valid only when the compare function is enabled. The maximum length of the path is 4,096 characters. Example: --input=/home/projects/input1,/home/projects/input2. |
No |
- When --events=launch is specified to collect Aten operator dispatch and access events, the PyTorch 2.3.1 and later versions in the framework Ascend Extension for PyTorch are required.
- When the value of --analysis contains decompose, the parameter Attr in the file leaks_dump_{timestamp}.csv contains the GPU memory type and component name.
- When the value of --analysis contains decompose, the memory decomposition function is enabled. Currently, the memory pools of the Ascend Extension for PyTorch, MindSpore, and ATB operator frameworks can be classified, but fine-grained classification of the memory pools of the MindSpore framework and ATB operator framework is not supported yet. In the Ascend Extension for PyTorch framework, the fine-grained classification of aten, weight, gradient, and optimizer_state is supported. The weight, gradient, and optimizer_state are applicable only to the PyTorch training scenario (that is, the scenario where the optimizer.step() interface is called). aten is the memory allocated by the aten operator, and the PyTorch 2.3.1 and later versions are required. --level must contain 0, and --events must contain alloc, free, and access.
- When --level=1 is specified and the tokenizers library of Hugging Face is used, the alarm "The current process just got forked. Parallelism is disabled." may be reported. This alarm does not affect functions and can be ignored. To avoid this alarm, run export TOKENIZERS_PARALLELISM=false to disable the parallelism behavior.
- When --collect-mode is set to deferred and the Python custom collection interface is used to collect data, the memory analysis function in a step is unavailable. The functions of memory block monitoring, decomposition, and inefficient memory identification are available only for the data within the collection scope.