Secondary Memory Allocation

If secondary memory allocation is required after AscendCL memory management APIs are called, comply with the constraints of each API to prevent memory overwriting.

Memory can be allocated in either of the following ways:
  • Allocate memory independently as required without split or second allocation.
  • Allocate a memory pool in advance and then allocate a memory block from the pool during use as required.

Call the following APIs for memory re-allocation. Pay attention to the restrictions on the memory address and memory size of each API. Otherwise, memory overwriting may occur.

For details about memory management, see Overview.

API

Description

Input/Output Buffer

aclrtMemcpyAsync

Copies memory. This API is asynchronous.

  • The source address and destination address passed to this call must be 64-byte aligned.

aclrtMalloc

Allocates size bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The allocation size is the input size rounded up to the nearest multiple of 32 bytes, plus 32 bytes.

  • If you want to allocate a large memory block, and divide and manage the memory, you are advised to use aclrtMallocAlign32. Compared with aclrtMalloc, aclrtMallocAlign32 only rounds the input size up to the nearest multiple of 32 bytes, but does not add an extra 32 bytes.

    If you allocate a large memory block using either aclrtMalloc or aclrtMallocAlign32, and divide and manage the memory, each memory segment must meet the following requirements:

    • The memory size is rounded up to the nearest multiple of 32 plus 32 bytes (m = ALIGN_UP[len,32] + 32 bytes).
    • The memory start address must be 64-byte aligned (ALIGN_UP[m,64]).
    NOTE:

    len indicates the size of a memory segment. ALIGN_UP[len,k] indicates rounding up to a multiple of k bytes as in this formula: ((len – 1)/k + 1) x k.

acldvppMalloc

Allocates device memory for media processing. The allocated huge page memory meets the data processing requirements (for example, the start address is 128-byte aligned). This API is synchronous.

For details, see aclblasS8gemm.

When the output of media data processing is used as the input of model inference, if you use this API to allocate a large memory block and divide and manage the memory, each memory segment must meet the following requirements:

  • The memory size is rounded up to the nearest multiple of 32 plus 32 bytes (m = ALIGN_UP[len,32] + 32 bytes).
  • The memory start address must be 128-byte aligned (ALIGN_UP[m, 128]).
NOTE:

len indicates the size of a memory segment. ALIGN_UP[len,k] indicates rounding up to a multiple of k bytes as in this formula: ((len – 1)/k + 1) x k.

aclrtMallocHost

Allocates the host memory (lock page memory). The system ensures that the start address of the memory is 64-byte aligned.

  • If you use this API to allocate a large memory block, and divide and manage the memory, each memory segment must meet the following requirements:
    • The memory size is rounded up to the nearest multiple of 32 plus 32 bytes (m = ALIGN_UP[len,32] + 32 bytes).
    • The memory start address must be 64-byte aligned (ALIGN_UP[m,64]).
    NOTE:

    len indicates the size of a memory segment. ALIGN_UP[len,k] indicates rounding up to a multiple of k bytes as in this formula: ((len – 1)/k + 1) x k.

aclrtMallocCached

Allocates size bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The allocated memory is cacheable.

Other restrictions are the same as those of aclrtMalloc.

In the computer vision field, the media data processing function is often used. Therefore, multiple memory allocation APIs are involved. The memory start address involves 64-byte or 128-byte alignment. To facilitate unified management, you are advised to select a large alignment value, for example, 128-byte alignment.

The following describes the typical scenarios where the memory is managed by the user during media data processing. DVPP Image/video Processing (Media Data Processing V1) details the available media data processing features.
Figure 1 VDEC scenario
Figure 2 JPEGD scenario