Data Transfer
This section describes the APIs for data transfer, their precautions, and sample code.
API Call Sequence
The key APIs for data transfer are called as follows:
- Allocate memory.
- Allocate host memory by calling either aclrtMallocHost or C++ equivalent API new or malloc.
- aclrtMallocHost attempts to allocate physically adjacent memory for better performance when the host exchanges data with the device. After calling aclrtMallocHost and before using the memory, you are advised to call aclrtMemset to initialize the memory and clear random numbers in the memory.
- After the malloc API is called and before using the memory, you need to call memset to initialize the memory and clear random numbers in the memory.
- Allocate device memory by calling aclrtMalloc. If media data processing (such as image decoding and resizing) is required, call acldvppMalloc or hi_mpi_dvpp_malloc to allocate memory.
- Allocate host memory by calling either aclrtMallocHost or C++ equivalent API new or malloc.
- Load data to the memory.
The implementation logic of loading data to the memory is managed by the user.
- Implement data transfer using memory copy.
Data transfer can be implemented in the following two modes:
- Synchronous memory copy (aclrtMemcpy)
- Asynchronous memory copy (aclrtMemcpyAsync) in addition to intra-stream synchronization with aclrtSynchronizeStream
- Data transfer within the host, within the device, or between the host and device can be implemented by using the memory copy API calls or by using pointers.
- When the synchronous or asynchronous memory replication API is called, the following types of replication are supported (you can click the link to view the memory replication sample code of each type):
In the Ascend RC scenario, host memory allocation and data transfer within the host or between the host and device are not involved.
- If the APIs for allocating the host memory and device memory are different, for example, the C++ standard library API, or aclrtMallocHost is called to allocate the host memory and aclrtMalloc is called to allocate the device memory:
Call aclrtGetRunMode to obtain the run mode of the software stack in advance. If ACL_HOST is returned, you only need to allocate host memory. If ACL_DEVICE is returned, you only need to allocate device memory. Although more code logic judgment is introduced, you do not need to care about address alignment of device memory. In the scenario where your app runs on the device, this mode does not require memory copy and promises better performance.
- If the APIs for allocating the host memory and device memory are the same, aclrtMallocHost is called to allocate memory, and the host memory or device memory is allocated based on the run mode of the software stack.
aclrtGetRunMode does not need to be called to obtain the run mode of the software stack. The code logic is simpler. However, address alignment of device memory needs to be guaranteed by the user.
Intra-Host Data Transfer
You can use the aclrtMemcpy API to execute an intra-host memory copy task, but you cannot use the aclrtMemcpyAsync API to execute the task asynchronously (corresponding to the ACL_MEMCPY_HOST_TO_HOST type). Otherwise, the API returns the error message "ACL_ERROR_RT_FEATURE_NOT_SUPPORT".
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostPtrA = NULL; void* hostPtrB = NULL; aclrtMallocHost(&hostPtrA, size); aclrtMallocHost(&hostPtrB, size); //2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostPtrA, size); // 3. Perform synchronous memory copy. //Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size. aclrtMemcpy(hostPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_HOST); //4. Destroy allocations in a timely manner. aclrtFreeHost(hostPtrA); aclrtFreeHost(hostPtrB); // ...... |
Host-to-Device Data Transfer
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
- Copy memory synchronously.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostPtrA = NULL; void* devPtrB = NULL; aclrtMallocHost(&hostPtrA, size); aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST); //2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostPtrA, size); // 3. Perform synchronous memory copy. //Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size. aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE); //4. Destroy allocations in a timely manner. aclrtFreeHost(hostPtrA); aclrtFree(devPtrB); // ......
- Copy memory asynchronously.
The host memory must be allocated by using aclrtMallocHost. Otherwise, no error is reported when the asynchronous memory copy API is called, but an unpredictable error may occur when related services are executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostAddr = NULL; void* devAddr = NULL; aclrtMallocHost(&hostAddr, size); aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); //2. Copy memory asynchronously. aclrtStream stream = NULL; aclrtCreateStream(&stream); //After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, hostAddr, size); aclrtMemcpyAsync(devAddr, size, hostAddr, size, ACL_MEMCPY_HOST_TO_DEVICE, stream); aclrtSynchronizeStream(stream); //3. Destroy allocations. aclrtDestroyStream(stream); aclrtFreeHost(hostAddr); aclrtFree(devAddr); // ......
Device-to-Host Data Transfer
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
- Copy memory synchronously.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* devPtrA = NULL; void* hostPtrB = NULL; aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST); aclrtMallocHost(&hostPtrB, size); //2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devPtrA, size); // 3. Perform synchronous memory copy. //Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size. aclrtMemcpy(hostPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_HOST); //4. Destroy allocations in a timely manner. aclrtFree(devPtrA); aclrtFreeHost(hostPtrB); // ......
- Copy memory asynchronously.
The host memory must be allocated by using aclrtMallocHost. Otherwise, no error is reported when the asynchronous memory copy API is called, but an unpredictable error may occur when related services are executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* hostAddr = NULL; void* devAddr = NULL; aclrtMallocHost(&hostAddr, size); aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); //2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devAddr, size); //3. Copy memory asynchronously. aclrtStream stream = NULL; aclrtCreateStream(&stream); aclrtMemcpyAsync(hostAddr, size, devAddr, size, ACL_MEMCPY_DEVICE_TO_HOST, stream); aclrtSynchronizeStream(stream); //4. Destroy allocations. aclrtDestroyStream(stream); aclrtFreeHost(hostAddr); aclrtFree(devAddr); // ......
Intra-Device Data Transfer
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
//1. Allocate memory. uint64_t size = 1 * 1024 * 1024; void* devPtrA = NULL; void* devPtrB = NULL; aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST); aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST); //2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile. ReadFile(fileName, devPtrA, size); //3. Perform synchronous or asynchronous memory copy. //Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size. aclrtMemcpy(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE); //Copy memory asynchronously. //Explicitly create a stream. aclrtStream stream; aclrtCreateStream(&stream); aclrtMemcpyAsync(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE, stream); aclrtSynchronizeStream(stream); //4. Destroy allocations in a timely manner. aclrtDestroyStream(stream); aclrtFree(devPtrA); aclrtFree(devPtrB); // ...... |
Data Transfer Between Devices
For the
Note the following restrictions:
- Call aclrtDeviceCanAccessPeer to query whether data exchange between two devices is supported. If data exchange is supported, use two aclrtDeviceEnablePeerAccess calls to enable data exchange: one for enabling data exchange from device 0 to device 1, and the other for enabling data exchange from device 1 to device 0. Then, call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to transfer data via memory copy.
- Only data exchange between devices in the same PCIe switch is supported.
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
int main(int argc, const char *argv[]) { //Perform initialization. auto ret = aclInit(NULL); int32_t canAccessPeer = 0; //Query whether data exchange is supported between device 0 and device 1. ret = aclrtDeviceCanAccessPeer(&canAccessPeer, 0, 1); //1 indicates that data exchange is supported. if (canAccessPeer == 1) { // ************************************************************ //Operations on device 0. ret = aclrtSetDevice(0); ret = aclrtDeviceEnablePeerAccess(1, 0); void *dev0; ret = aclrtMalloc(&dev0, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P); ret = aclrtMemset(dev0, 10, 1, 10); ...... // ************************************************************ //Enable data exchange from device 1 to device 0. The device 1 is set through aclrtSetDevice, while device 0 is specified by the first parameter of aclrtDeviceEnablePeerAccess. ret = aclrtSetDevice(1); ret = aclrtDeviceEnablePeerAccess(0, 0); void *dev1; ret = aclrtMalloc(&dev1, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P); ret = aclrtMemset(dev1, 10, 0, 10); //Perform memory copy to transfer data from device 0 to device 1. ret = aclrtMemcpy(dev1, 10, dev0, 10, ACL_MEMCPY_DEVICE_TO_DEVICE); ret = aclrtResetDevice(1); ...... // ************************************************************ // ************************************************************ //Call the aclrtResetDevice API to destroy the allocation on device 0. ret = aclrtSetDevice(0); ret = aclrtResetDevice(0); ...... // ************************************************************ printf("P2P copy success\n"); } else { printf("current device doesn't support p2p feature\n"); } //Perform deinitialization. aclFinalize(); return 0; } |