API Call Sequence

The process of calling key APIs for data transfer is as follows:

  1. Allocate memory.
    • Call acl.rt.malloc_host provided by pyACL to allocate host memory.

      acl.rt.malloc_host attempts to allocate physically adjacent memory for better performance when the host exchanges data with the device.

      After calling acl.rt.malloc_host and before using the memory, call acl.rt.memset to initialize the memory and clear random numbers in the memory.

    • Call acl.rt.malloc provided by pyACL to allocate device memory. If data preprocessing (such as image decoding and resizing) is required, call acl.media.dvpp_malloc or acl.himpi.dvpp_malloc to allocate the memory.
  2. Load data to the memory.

    The implementation logic of reading data to the memory is managed by users.

  3. Implement data transfer using memory copy.
    Data transfer can be implemented in the following two memory copy modes:

    In the Ascend RC scenario, host memory allocation, intra-host data transfer, and data transfer between the host and device are not involved.

    If the current version supports multiple run modes and you want to run your app in multiple forms, the selection of the data transfer API varies with the memory allocation method.
    • Assume that the APIs for allocating the host memory and device memory are different, for example, the acl.rt.malloc_host API is called to allocate the host memory and the acl.rt.malloc API is called to allocate the device memory.

      acl.rt.get_run_mode needs to be called to obtain the run mode of the software stack.

      • If the query result is ACL_HOST = 1, host memory needs to be allocated for data transfer.
      • If the query result is ACL_DEVICE = 0, only device memory needs to be allocated for data transfer.

      This method features more complex code logic, so that you do not have to handle device memory alignment. When your app runs on the device, this mode requires less memory copy, ensuring higher performance.

    • Assume that the APIs for allocating the host memory and device memory are the same, that is, acl.rt.malloc_host is called to allocate memory and pyACL determines whether the host memory or device memory is allocated based on the run mode of the software stack.

      You do not need to call acl.rt.get_run_mode to obtain the run mode of the software stack. This method features simpler code logic. However, you need to handle device memory alignment.