Memory Block Monitoring

In foundation model scenarios, the computing task of a single ID is highly complex. If memory corruption occurs, it is very difficult to locate the problem. msLeaks monitors the specified memory block before and after operator execution through Python interfaces. Based on the changes of the memory block data, it locates the range or specific location of memory corruption between operators.

Procedure

  1. Run the following command to disable multi-task dispatch.
    export ASCEND_LAUNCH_BLOCKING=1
  2. Run the following command to enable memory block monitoring.
    msleaks ${Application} --watch=start:outid,end,full-content
    • The memory block monitoring function supports only single Aten operators and ATB operators. You can set --level to specify the memory block monitoring at the op and kernel levels.
    • In the Ascend Extension for PyTorch scenario, kernel operator monitoring is only performed through the Python interfaces and is not supported via watch CLIs. For monitoring via the Python interfaces, see 3.
    • You need to limit the range of operators to be monitored and the memory block size to prevent longer dump times and excessive disk space consumption due to overly large value settings.
    Table 1 Parameter description

    Parameter

    Description

    Application

    Executable scripts of the user.

    If you need to use the Python interfaces to specify the tensor to be monitored, see 3.

    --watch

    Enable the memory block monitoring function.

    • start: optional, string type. It indicates the start of operator monitoring.
    • outid: optional. It indicates the output ID of the operator. When the tensor is a list, you can specify the tensor that needs to be dumped to a given path. The value is the subscript number of the tensor in the list.
    • end: mandatory, string type. It indicates the end of operator monitoring.
    • full-content: optional. It indicates that all memory data is dumped to the specified path, meaning the binary file of each tensor is dumped to the path. If this value is not selected, the light-weight dump is performed, and only the hash value of the tensor is dumped to the path.

    Example:

    --watch=token0/layer0/module0/op0,token0/layer0/module0/op1,full-content

  3. In the executable script of the user, invoke the Python interfaces to specify the tensor to be monitored.

    The interfaces of the Python watcher module are added. The watch interface indicates that the memory block is monitored, and the remove interface indicates that the memory block monitoring is canceled. There are two methods to enable memory block monitoring. For details about the parameters in the sample code, see Table 2.

    • Method 1: Input the tensor directly.

      The example script is as follows:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      import torch
      import torch_npu
      import msleaks
      
      torch.npu.synchronize()
      test_tensor = torch.randn(2,3).to('npu:0')        # Create or select the tensor to be monitored as required.
      msleaks.watcher.watch(test_tensor, name="test", dump_nums=2)
      ...
      torch.npu.synchronize()
      msleaks.watcher.remove(test_tensor)
      
    • Method 2: Input the address and length of the memory block.
      The example script is as follows:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      import torch
      import torch_npu
      import msleaks
      
      torch.npu.synchronize()
      test_tensor = torch.randn(2,3).to('npu:0')       
      msleaks.watcher.watch(test_tensor.data_ptr(), length=1000, name="test", dump_nums=2)
      ...
      torch.npu.synchronize()
      msleaks.watcher.remove(test_tensor.data_ptr(), length=1000)
      

      You are advised to use method 1 to specify the tensor to be monitored. If method 2 is used, confirm the validity of the memory block address and length.

    Table 2 Parameters for enabling memory block monitoring

    Parameter

    Description

    name

    Mandatory. It identifies the monitored tensor to be dumped.

    dump_nums

    Optional. It specifies the number of dumps. If no value is specified, the number of dumps is unlimited.

    test_tensor.data_ptr()

    Mandatory. It indicates the address of the monitored tensor.

    This parameter is required only when method 2 is used to enable memory block monitoring.

    length

    Mandatory. It indicates the length of the monitored memory block. When length is specified, the non-keyword argument can only be an integer variable of the address type. It is recommended that the length value be less than or equal to the size of the known monitored tensor memory block.

    This parameter is required only when method 2 is used to enable memory block monitoring.

  4. After the command is executed, the result directory generated by memory block monitoring is as follows:
    ├── leaksDumpResults             
    │    └── watch_dump
          │ ├── {deviceid}_{tid}_{opName}_{Call count}-{watchedOpName}_{outid}_{before/after}.bin # Dump the .bin file when full-content is specified.
          │ ├── watch_dump_data_check_sum_{deviceid}_{timestamp}.csv # Dump the .csv file when the full-content is not specified.

Result Description

The output of the memory block monitoring function is a .bin or .csv file.

  • The .bin file records the detailed dump result of the tensor.
  • The .csv file records only the hash value of the tensor.