aclrtMalloc

Applicability

Product

Supported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

Description

Allocates linear memory of the size bytes on the device and returns the allocated memory pointer through *devPtr. The start address of the memory is 64-byte aligned.

The memory allocated by this API is byte-aligned. The size bytes requested by users are rounded up to the nearest multiple of 32 bytes, with an additional 32 bytes added. However, for huge page memory with the memory allocation granularity of 1 GB, to save huge page memory, this API only rounds the input size up to the nearest multiple of 32 bytes, but does not add an extra 32 bytes.

Prototype

aclError aclrtMalloc(void **devPtr, size_t size, aclrtMemMallocPolicy policy)

Parameters

Parameter

Input/Output

Description

devPtr

Output

Pointer to the pointer to the allocated device memory.

size

Input

Requested allocation size in bytes.

Must not be 0.

policy

Input

Memory allocation policy.

If the configured memory allocation policy is not within the value range of aclrtMemMallocPolicy, and the size is greater than or equal to 2 MB, the huge page memory is allocated; otherwise, the common page memory is allocated.

Returns

0 on success; else, failure. For details, see aclError.

Restrictions

  • The memory allocated by this API is not initialized. Before using the memory, call aclrtMemset to initialize it and clear its random numbers.
  • This API does not perform implicit device synchronization or stream synchronization. The result is returned immediately no matter whether memory application succeeds or fails.
  • Memory allocated by the aclrtMalloc call needs to be freed by the aclrtFree call.
  • Performance deterioration will be caused by too frequent calls to aclrtMalloc and aclrtFree. You are advised to allocate or manage memory in advance to avoid unnecessary memory allocation and deallocation.
  • If you want to allocate a large memory block, and divide and manage the memory, you are advised to use aclrtMallocAlign32. Compared with aclrtMalloc, aclrtMallocAlign32 only rounds the input size up to the nearest multiple of 32 bytes, but does not add an extra 32 bytes.

    If you use either aclrtMalloc or aclrtMallocAlign32 to allocate a large memory block and divide and manage the memory, each memory segment must meet the requirements listed below. len indicates the size of a memory segment. ALIGN_UP[len,k] indicates rounding up to a multiple of k bytes as in this formula: ((len – 1)/k + 1) x k.

    • The memory size is rounded up to the nearest multiple of 32 plus 32 bytes (m = ALIGN_UP[len,32] + 32 bytes).
    • The memory start address must be 64-byte aligned (ALIGN_UP[m,64]).

See Also

For the API call example, see Data Transfer.