Data Transfer
This section describes the APIs related to data transmission, precautions, and sample code.
API Call Sequence
The API call sequence for data transfer is as follows:
- Allocate memory.
- Allocate host memory by calling either the AscendCL aclrtMallocHost or C++ equivalent API new or malloc.
- aclrtMallocHost attempts to allocate physically adjacent memory for better performance when the host exchanges data with the device. After calling aclrtMallocHost and before using the memory, you are advised to call aclrtMemset to initialize the memory and clear random numbers in the memory.
- After the malloc API is called and before using the memory, you need to call memset to initialize the memory and clear random numbers in the memory.
- Allocate device memory by calling the AscendCL API aclrtMalloc. If media data processing (such as image decoding and resizing) is required, call acldvppMalloc or hi_mpi_dvpp_malloc to allocate memory.
- Allocate host memory by calling either the AscendCL aclrtMallocHost or C++ equivalent API new or malloc.
- Load data to the memory.
The implementation logic of loading data to the memory is managed by the user.
- Implement data transfer using memory copy.
Data transfer can be implemented in the following two memory copy modes:
- Synchronous memory copy (aclrtMemcpy)
- Asynchronous memory copy: Call the aclrtMemcpyAsync or aclrtMemcpyAsyncWithCondition API, and then call the aclrtSynchronizeStream API to implement intra-stream task synchronization.
- Data transfer within the host, within the device, or between the host and device can be implemented by using the memory copy API calls or by using pointers.
- When the synchronous or asynchronous memory replication API is called, the following types of replication are supported (you can click the link to view the memory replication sample code of each type):
In the Ascend RC scenario, host memory allocation and data transfer within the host or between the host and device are not involved.
- If the APIs for allocating the host memory and device memory are different, for example, the C++ standard library API or the aclrtMallocHost API is called to allocate the host memory and the aclrtMalloc API is called to allocate the device memory:
Call aclrtGetRunMode to obtain the run mode of the software stack. If ACL_HOST is returned, you only need to allocate host memory. If ACL_DEVICE is returned, you only need to allocate device memory. Although more code logic judgment is introduced, you do not need to care about address alignment of device memory. In the scenario where your app runs on the device, this mode does not require memory copy and promises better performance.
- If the APIs for allocating the host memory and device memory are the same, aclrtMallocHost is called to allocate memory, and AscendCL determines whether the host memory or device memory is allocated based on the run mode of the software stack.
aclrtGetRunMode does not need to be called to obtain the run mode of the software stack. The code logic is simpler. However, address alignment of device memory needs to be guaranteed by the user.
Intra-Host Data Transfer
Currently, the aclrtMemcpy API can be called to perform synchronous memory copy within the host, but the aclrtMemcpyAsync API cannot be called to perform asynchronous memory copy within the host (corresponding to the ACL_MEMCPY_HOST_TO_HOST type). Otherwise, the API returns an error message " ACL_ERROR_RT_FEATURE_NOT_SUPPORT."
After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostPtrA = NULL; void* hostPtrB = NULL; aclrtMallocHost(&hostPtrA, size); aclrtMallocHost(&hostPtrB, size); // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostPtrA, size); // 3. Perform synchronous memory copy. // Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size. aclrtMemcpy(hostPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_HOST); // 4. Destroy allocations in a timely manner. aclrtFreeHost(hostPtrA); aclrtFreeHost(hostPtrB); // ...... |
Host-to-Device Data Transfer
After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.
- Copy memory synchronously.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
// 1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostPtrA = NULL; void* devPtrB = NULL; aclrtMallocHost(&hostPtrA, size); aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST); // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostPtrA, size); // 3. Perform synchronous memory copy. // Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size. aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE); // 4. Destroy allocations in a timely manner. aclrtFreeHost(hostPtrA); aclrtFree(devPtrB); // ......
- Copy memory asynchronously.
The host memory needs to be allocated by using aclrtMallocHost. Otherwise, the asynchronous memory copy API does not report an error, but an unpredictable error may occur when related services are executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
// 1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostAddr = NULL; void* devAddr = NULL; aclrtMallocHost(&hostAddr, size); aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); // 2. Copy memory asynchronously. aclrtStream stream = NULL; aclrtCreateStream(&stream); // After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostAddr, size); aclrtMemcpyAsync(devAddr, size, hostAddr, size, ACL_MEMCPY_HOST_TO_DEVICE, stream); aclrtSynchronizeStream(stream); // 3. Destroy allocations. aclrtDestroyStream(stream); aclrtFreeHost(hostAddr); aclrtFree(devAddr); // ......
Device-to-Host Data Transfer
After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.
- Copy memory synchronously.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
// 1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* devPtrA = NULL; void* hostPtrB = NULL; aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST); aclrtMallocHost(&hostPtrB, size); // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devPtrA, size); // 3. Perform synchronous memory copy. // Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size. aclrtMemcpy(hostPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_HOST); // 4. Destroy allocations in a timely manner. aclrtFree(devPtrA); aclrtFreeHost(hostPtrB); // ......
- Copy memory asynchronously.
The host memory needs to be allocated by using aclrtMallocHost. Otherwise, the asynchronous memory copy API does not report an error, but an unpredictable error may occur when related services are executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
// 1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostAddr = NULL; void* devAddr = NULL; aclrtMallocHost(&hostAddr, size); aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devAddr, size); // 3. Copy memory asynchronously. aclrtStream stream = NULL; aclrtCreateStream(&stream); aclrtMemcpyAsync(hostAddr, size, devAddr, size, ACL_MEMCPY_DEVICE_TO_HOST, stream); aclrtSynchronizeStream(stream); // 4. Destroy allocations. aclrtDestroyStream(stream); aclrtFreeHost(hostAddr); aclrtFree(devAddr); // ......
Intra-Device Data Transfer
After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* devPtrA = NULL; void* devPtrB = NULL; aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST); aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST); // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devPtrA, size); // 3. Perform synchronous or asynchronous memory copy. // Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size. aclrtMemcpy(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE); // Copy memory asynchronously. // Explicitly create a stream. aclrtStream stream; aclrtCreateStream(&stream); aclrtMemcpyAsync(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE, stream); aclrtSynchronizeStream(stream); // 4. Destroy allocations in a timely manner. aclrtDestroyStream(stream); aclrtFree(devPtrA); aclrtFree(devPtrB); // ...... |
Inter-Device Data Transfer
For the
Note the following restrictions:
- To perform memory copy between two devices, call aclrtDeviceCanAccessPeer to query whether memory copy between the devices is supported. If memory copy is supported, use two aclrtDeviceEnablePeerAccess calls to enable memory copy: one for enabling memory copy from device 0 to device 1, and the other for enabling memory copy from device 1 to device 0. Then, call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to transfer data via memory copy.
- Only memory copy between devices in the same PCIe Switch is supported.
- Only memory copy between devices from the same thread or different threads in the same process is supported.
After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
int main(int argc, const char *argv[]) { // Initialize AscendCL. auto ret = aclInit(NULL); int32_t canAccessPeer = 0; // Query whether memory copy is supported between device 0 and device 1. ret = aclrtDeviceCanAccessPeer(&canAccessPeer, 0, 1); // 1 indicates that memory copy is supported. if (canAccessPeer == 1) { // ************************************************************ // Operations on device 0. ret = aclrtSetDevice(0); ret = aclrtDeviceEnablePeerAccess(1, 0); void *dev0; ret = aclrtMalloc(&dev0, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P); ret = aclrtMemset(dev0, 10, 1, 10); ...... // ************************************************************ // Copy memory from device 1 to device 0. The device 1 is set through aclrtSetDevice, while device 0 is specified by the first parameter of aclrtDeviceEnablePeerAccess. ret = aclrtSetDevice(1); ret = aclrtDeviceEnablePeerAccess(0, 0); void *dev1; ret = aclrtMalloc(&dev1, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P); ret = aclrtMemset(dev1, 10, 0, 10); // Perform memory copy to transfer data from device 0 to device 1. ret = aclrtMemcpy(dev1, 10, dev0, 10, ACL_MEMCPY_DEVICE_TO_DEVICE); ret = aclrtResetDevice(1); ...... // ************************************************************ // ************************************************************ // Call aclrtResetDevice to release the resources of device 0. ret = aclrtSetDevice(0); ret = aclrtResetDevice(0); ...... // ************************************************************ printf("P2P copy success\n"); } else { printf("current device doesn't support p2p feature\n"); } // Deinitialize AscendCL. aclFinalize(); return 0; } |