Data Transfer

This section describes the APIs related to data transmission, precautions, and sample code.

API Call Sequence

The API call sequence for data transfer is as follows:

  1. Allocate memory.
    • Allocate host memory by calling either the AscendCL aclrtMallocHost or C++ equivalent API new or malloc.
      • aclrtMallocHost attempts to allocate physically adjacent memory for better performance when the host exchanges data with the device. After calling aclrtMallocHost and before using the memory, you are advised to call aclrtMemset to initialize the memory and clear random numbers in the memory.
      • After the malloc API is called and before using the memory, you need to call memset to initialize the memory and clear random numbers in the memory.
    • Allocate device memory by calling the AscendCL API aclrtMalloc. If media data processing (such as image decoding and resizing) is required, call acldvppMalloc or hi_mpi_dvpp_malloc to allocate memory.
  2. Load data to the memory.

    The implementation logic of loading data to the memory is managed by the user.

  3. Implement data transfer using memory copy.
    Data transfer can be implemented in the following two memory copy modes:

In the Ascend RC scenario, host memory allocation and data transfer within the host or between the host and device are not involved.

If the current version supports multiple running modes (such as the Ascend EP mode, Ascend RC mode, and mode) and you want the same application to run in multiple modes, different memory allocation modes affect the APIs called during data transmission.
  • If the APIs for allocating the host memory and device memory are different, for example, the C++ standard library API or the aclrtMallocHost API is called to allocate the host memory and the aclrtMalloc API is called to allocate the device memory:

    Call aclrtGetRunMode to obtain the run mode of the software stack. If ACL_HOST is returned, you only need to allocate host memory. If ACL_DEVICE is returned, you only need to allocate device memory. Although more code logic judgment is introduced, you do not need to care about address alignment of device memory. In the scenario where your app runs on the device, this mode does not require memory copy and promises better performance.

  • If the APIs for allocating the host memory and device memory are the same, aclrtMallocHost is called to allocate memory, and AscendCL determines whether the host memory or device memory is allocated based on the run mode of the software stack.

    aclrtGetRunMode does not need to be called to obtain the run mode of the software stack. The code logic is simpler. However, address alignment of device memory needs to be guaranteed by the user.

Intra-Host Data Transfer

Currently, the aclrtMemcpy API can be called to perform synchronous memory copy within the host, but the aclrtMemcpyAsync API cannot be called to perform asynchronous memory copy within the host (corresponding to the ACL_MEMCPY_HOST_TO_HOST type). Otherwise, the API returns an error message " ACL_ERROR_RT_FEATURE_NOT_SUPPORT."

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
//1. Allocate memory.
uint64_t size = 1 * 1024 * 1024;
void* hostPtrA = NULL;
void* hostPtrB = NULL;
aclrtMallocHost(&hostPtrA, size);
aclrtMallocHost(&hostPtrB, size);

// 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
ReadFile(fileName, hostPtrA, size);

// 3. Perform synchronous memory copy.
// Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size.
aclrtMemcpy(hostPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_HOST);

// 4. Destroy allocations in a timely manner.
aclrtFreeHost(hostPtrA);
aclrtFreeHost(hostPtrB);
// ......

Host-to-Device Data Transfer

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

  • Copy memory synchronously.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    // 1. Allocate memory.
    uint64_t size = 1 * 1024 * 1024;
    void* hostPtrA = NULL;
    void* devPtrB = NULL;
    aclrtMallocHost(&hostPtrA, size);
    aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST);
    
    // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
    ReadFile(fileName, hostPtrA, size);
    
    // 3. Perform synchronous memory copy.
    // Copy memory synchronously. hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size.
    aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE);
    
    // 4. Destroy allocations in a timely manner.
    aclrtFreeHost(hostPtrA);
    aclrtFree(devPtrB);
    
    // ......
    
  • Copy memory asynchronously.
    The host memory needs to be allocated by using aclrtMallocHost. Otherwise, the asynchronous memory copy API does not report an error, but an unpredictable error may occur when related services are executed.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    // 1. Allocate memory.
    uint64_t size = 1 * 1024 * 1024;
    void* hostAddr = NULL;
    void* devAddr = NULL;
    
    aclrtMallocHost(&hostAddr, size);
    aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST);
    
    // 2. Copy memory asynchronously.
    aclrtStream stream = NULL;
    aclrtCreateStream(&stream);
    // After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
    ReadFile(fileName, hostAddr, size);
    aclrtMemcpyAsync(devAddr, size, hostAddr, size, ACL_MEMCPY_HOST_TO_DEVICE, stream);
    aclrtSynchronizeStream(stream);
    
    // 3. Destroy allocations.
    aclrtDestroyStream(stream);
    aclrtFreeHost(hostAddr);
    aclrtFree(devAddr);
    
    // ......
    

Device-to-Host Data Transfer

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

  • Copy memory synchronously.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    // 1. Allocate memory.
    uint64_t size = 1 * 1024 * 1024;
    void* devPtrA = NULL;
    void* hostPtrB = NULL;
    aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST);
    aclrtMallocHost(&hostPtrB, size);
    
    // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
    ReadFile(fileName, devPtrA, size);
    
    // 3. Perform synchronous memory copy.
    // Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size.
    aclrtMemcpy(hostPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_HOST);
    
    // 4. Destroy allocations in a timely manner.
    aclrtFree(devPtrA);
    aclrtFreeHost(hostPtrB);
    
    // ......
    
  • Copy memory asynchronously.
    The host memory needs to be allocated by using aclrtMallocHost. Otherwise, the asynchronous memory copy API does not report an error, but an unpredictable error may occur when related services are executed.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    // 1. Allocate memory.
    uint64_t size = 1 * 1024 * 1024;
    void* hostAddr = NULL;
    void* devAddr = NULL;
    
    aclrtMallocHost(&hostAddr, size);
    aclrtMalloc(&devAddr, size, ACL_MEM_MALLOC_HUGE_FIRST);
    
    // 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
    ReadFile(fileName, devAddr, size);
    
    // 3. Copy memory asynchronously.
    aclrtStream stream = NULL;
    aclrtCreateStream(&stream);
    aclrtMemcpyAsync(hostAddr, size, devAddr, size, ACL_MEMCPY_DEVICE_TO_HOST, stream);
    aclrtSynchronizeStream(stream);
    
    // 4. Destroy allocations.
    aclrtDestroyStream(stream);
    aclrtFreeHost(hostAddr);
    aclrtFree(devAddr);
    
    // ......
    

Intra-Device Data Transfer

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//1. Allocate memory.
uint64_t size = 1 * 1024 * 1024;
void* devPtrA = NULL;
void* devPtrB = NULL;
aclrtMalloc(&devPtrA, size, ACL_MEM_MALLOC_HUGE_FIRST);
aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST);

// 2. After the memory is allocated, load data into the memory and implement the user-defined function ReadFile.
ReadFile(fileName, devPtrA, size);

// 3. Perform synchronous or asynchronous memory copy.
// Copy memory synchronously. devPtrA indicates the pointer to the source memory address on the device. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size.
aclrtMemcpy(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE);
  
// Copy memory asynchronously.
// Explicitly create a stream.
aclrtStream stream;
aclrtCreateStream(&stream);
aclrtMemcpyAsync(devPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_DEVICE, stream);
aclrtSynchronizeStream(stream);

// 4. Destroy allocations in a timely manner.
aclrtDestroyStream(stream);
aclrtFree(devPtrA);
aclrtFree(devPtrB);

// ......

Inter-Device Data Transfer

For the Atlas 200/300/500 Inference Product , this function is not supported.

Note the following restrictions:

  • To perform memory copy between two devices, call aclrtDeviceCanAccessPeer to query whether memory copy between the devices is supported. If memory copy is supported, use two aclrtDeviceEnablePeerAccess calls to enable memory copy: one for enabling memory copy from device 0 to device 1, and the other for enabling memory copy from device 1 to device 0. Then, call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to transfer data via memory copy.
  • Only memory copy between devices in the same PCIe Switch is supported.
  • Only memory copy between devices from the same thread or different threads in the same process is supported.

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
int main(int argc, const char *argv[])
{
    // Initialize AscendCL.
    auto ret = aclInit(NULL);

    int32_t canAccessPeer = 0;
    // Query whether memory copy is supported between device 0 and device 1.
    ret = aclrtDeviceCanAccessPeer(&canAccessPeer, 0, 1);


     // 1 indicates that memory copy is supported.
    if (canAccessPeer == 1) {
// ************************************************************
	// Operations on device 0.
	ret = aclrtSetDevice(0);
	ret = aclrtDeviceEnablePeerAccess(1, 0);
	void *dev0;
	ret = aclrtMalloc(&dev0, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P);
	ret = aclrtMemset(dev0, 10, 1, 10);
	......	
// ************************************************************
	// Copy memory from device 1 to device 0. The device 1 is set through aclrtSetDevice, while device 0 is specified by the first parameter of aclrtDeviceEnablePeerAccess.
	ret = aclrtSetDevice(1);
	ret = aclrtDeviceEnablePeerAccess(0, 0);
	void *dev1;
	ret = aclrtMalloc(&dev1, 10, ACL_MEM_MALLOC_HUGE_FIRST_P2P);
	ret = aclrtMemset(dev1, 10, 0, 10);
		
	// Perform memory copy to transfer data from device 0 to device 1.
	ret = aclrtMemcpy(dev1, 10, dev0, 10, ACL_MEMCPY_DEVICE_TO_DEVICE);
	ret = aclrtResetDevice(1);
        ......
// ************************************************************

// ************************************************************
// Call aclrtResetDevice to release the resources of device 0.
        ret = aclrtSetDevice(0);
	ret = aclrtResetDevice(0);
        ......
// ************************************************************

	printf("P2P copy success\n");
    } else {
	printf("current device doesn't support p2p feature\n");
    }

    // Deinitialize AscendCL.
    aclFinalize();
    return 0;
}