Running a Graph Asynchronously in the Single-Process and Single-Device Mode

This section provides an example of using an asynchronous API to run a graph on the device memory.

Overview

This feature is not supported by the Atlas 200I/500 A2 inference products .

This section describes how to build and run a graph asynchronously to generate a result. The following figure illustrates the API call sequence.

GEInitializeV2: initializes the system and allocates resources. (This API can be called before graph construction.)
aclInit: initializes acl APIs.
Session constructor: creates a Session object and allocates Session resources.
aclrtSetDevice: specifies the running device; aclrtCreateStream: creates a stream; aclrtMallocHost: allocates the host memory; aclrtMalloc: allocates the device memory.
AddGraph: adds a graph to the Session object.
(Optional) CompileGraph: builds the graph.
(Optional) LoadGraph: (asynchronous graph execution scenario) loads the graph model to the stream created in 4.
RunGraphWithStreamAsync: runs the graph asynchronously.
If the graph is not loaded by calling LoadGraph before this API is called, this API will automatically call LoadGraph to complete the loading. If the graph is not built by calling CompileGraph before calling LoadGraph, LoadGraph will automatically call CompileGraph to complete the build.
aclrtSynchronizeStream: waits for stream tasks to complete.
aclrtFree and aclrtFreeHost: frees the memory; GEFinalizeV2: releases system resources; and aclFinalize: releases acl-related resources.

Example

Include header files, including those of acl and C or C++ standard library.

        
             #include "ge_api_v2.h"
#include "acl.h"
#include "acl_rt.h"

Allocate system resources.

After a graph is defined, call GEInitializeV2 to initialize the system (or call it before defining a graph) and allocate system resources. The sample code is as follows:

        
             std::map<AscendString, AscendString>config = {{"ge.exec.deviceId", "0"},
                                              {"ge.graphRunMode", "1"}};
Status ret = ge::GEInitializeV2(config);

Set the GE initialization configuration by using config. Configure ge.exec.deviceId to specify the device where a GE instance runs, and ge.graphRunMode to specify the graph run mode (set to 0 for online inference and 1 for training). For more configurations, see Command-Line Options.

You are advised not to configure the dump information in GE options and the dump information configured when the acl initialization API is called at the same time. Otherwise, exceptions may occur. This rule applies to other parameters with the same function.

Initialize acl resources.

        
             std::string aclConfigPath = "xx/xx/xx";
aclError retInit = aclInit(aclConfigPath);
if (retInit != ACL_ERROR_NONE) {
    // ...
    // ...
    return FAILED;
}

Create a session.

To run a defined graph, create a Session object. options in the Session can be used to load configuration parameters. For details about the supported configuration parameters, see Command-Line Options.

         
              std::map <AscendString, AscendString> options;
// Create a session object.
ge::GeSession* session = new GeSession(options);
// Check whether the session is created successfully.
if(session == nullptr) {
  std::cout << "Create session failed." << std::endl;
  // ...
  // ...
  // Destroy allocations.
  ge::GEFinalizeV2();  
  return FAILED;
}

Specify the running device, create a stream, and allocate memory.

        
             // Specify the compute device.
int32_t deviceId = 0;
retInit = aclrtSetDevice(deviceId);

// Create a stream.
aclrtStream stream = nullptr;
aclError aclRet = aclrtCreateStream(&stream);

// Allocate host memory.
void* hostPtrA = NULL;
size_t size = 1024;
aclRet = aclrtMallocHost(&hostPtrA, size);
// Allocate the device memory.
void* devPtrB = NULL;
aclRet = aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST);

// Copy the memory and transfer data from the host to the device.
// hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size.
aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE);

Add a graph object.

Call the AddGraph API to add a graph. The sample code is as follows:

         
              // Prepare the graph ID to be added to the session and create an empty graph object.
uint32_t conv_graph_id = 0;
ge::Graph conv_graph;
// Add the graph to the session.
Status ret = session->AddGraph(conv_graph_id, conv_graph);
if(ret != SUCCESS) {
  // ...
  // ...
  // Destroy allocations and the session.
  ge::GEFinalizeV2();
  delete session;
  return FAILED;
}

Set the run configuration by using options. For details, see the Session constructor. The graph execution result will be saved to the output_cov tensor.

(Optional) Build the graph.

If the CompileGraph API is not called, the LoadGraph API will automatically call CompileGraph to complete the build.

         
              uint32_t graph_id = 0;
ret = session-> CompileGraph(graph_id);
if(ret != SUCCESS) {
  // ...
  // ...
  // Destroy allocations.
  ge::GEFinalizeV2();
  delete session;
  return FAILED;
}

(Optional) Load the graph to the created stream.

If the LoadGraph API is not called, the RunGraphWithStreamAsync API will automatically call LoadGraph to complete the loading. options in LoadGraph can be used to load configuration parameters. For example, the ge.exec.frozenInputIndexes parameter (index of the input tensor whose address is not refreshed) can be loaded. This parameter can improve the graph execution performance. For details about more configuration parameters supported by options, see Command-Line Options.

         
              std::map <AscendString, AscendString> options;
uint32_t graph_id = 0;
ret = session->LoadGraph(graph_id, options, stream);
if(ret != SUCCESS) {
  // ...
  // ...
  // Destroy allocations.
  ge::GEFinalizeV2();
  delete session;
  return FAILED;
}

Transfer data.

        
             // Copy the memory and transfer data from the host to the device.
// hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size.
aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE);

Run the graph asynchronously and return the execution result.

        
             std::vector<gert::Tensor> input;
std::vector<gert::Tensor> output;
ret = session->RunGraphWithStreamAsync(graph_id, stream, input, output);

// Call aclrtSynchronizeStream to wait for the stream tasks to complete.
aclRet = aclrtSynchronizeStream(stream);

// Copy the memory and transfer the device data back to the host.
// devPtrA indicates the pointer to the source memory address on the device. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size.
aclrtMemcpy(hostPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_HOST);

Destroy allocations.

        
             // Destroy memory allocations.
ret = aclrtFree(devPtrB);
ret = aclrtFreeHost(hostPtrA);

// Destroy graph allocations.
ret = ge::GEFinalizeV2();

// Deinitialize acl.
ret = aclFinalize();

Parent topic: Running a Graph Asynchronously