Graph Execution with Parameter Configuration

This section provides an example of using an asynchronous API to run a graph on the device memory.

Description

This feature is not supported by the Atlas 200/300/500 Inference Product .

This section describes how to build and run a graph asynchronously to generate a result. The differences between this section and Graph Execution Without Parameter Configuration are as follows:

The LoadGraph API used in this section can be used to load the ge.exec.frozenInputIndexes (index of the input tensor whose address is not refreshed) configuration parameter, which can improve the graph execution performance.

The following figure illustrates the API call sequence.

GEInitialize: initializes the system and allocates resources. (This API can be called before graph construction.)
Session constructor: creates a Session object and allocates Session resources.
AddGraph: adds a graph to the Session object.
aclInit: initializes AscendCL.
CompileGraph: builds the graph.
aclrtSetDevice: specifies the running device; aclrtCreateStream: creates a stream; aclrtMallocHost: allocates the host memory; aclrtMalloc: allocates the device memory.
LoadGraph: (asynchronous graph execution scenario) loads the graph model to the stream created in 6.
ExecuteGraphWithStreamAsync: runs the graph asynchronously.
aclrtSynchronizeStream: waits for stream tasks to complete.
aclrtFree and aclrtFreeHost: frees the memory; GEFinalize: releases system resources; aclFinalize: releases AscendCL-related resources.

Example

Include header files, including those of AscendCL and C or C++ standard library.

        
             #include "ge_api.h"
#include "acl.h"
#include "acl_rt.h"

Allocate system resources.

After a graph is defined, call GEInitialize to initialize the system (or call it before defining a graph) and allocate system resources. The sample code is as follows:

        
             std::map<AscendString, AscendString>config = {{"ge.exec.deviceId", "0"},
                                              {"ge.graphRunMode", "1"}};
Status ret = ge::GEInitialize(config);

Set the GE initialization configuration by using config. Configure ge.exec.deviceId to specify the device where a GE instance runs, and ge.graphRunMode to specify the graph run mode (set to 0 for online inference and 1 for training). For more configurations, see Command-Line Options.

You are advised not to configure the dump information in GE options and the dump information configured when the AscendCL initialization API is called at the same time. Otherwise, exceptions may occur.

Add a graph object and run the graph.

To run a graph, create a session object, and call the AddGraph API to add the graph. The sample code is as follows:

         
          
            
            
              std::map <AscendString, AscendString> options;
ge::Session *session = new Session(options);
if(session == nullptr) {
  std::cout << "Create session failed." << std::endl;
  ...
  ...
  // Destroy allocations.
  ge::GEFinalize();  
  return FAILED;
}
uint32_t conv_graph_id = 0;
ge::Graph conv_graph;
Status ret = session->AddGraph(conv_graph_id, conv_graph);
if(ret != SUCCESS) {
  ...
  ...
  // Destroy allocations.
  ge::GEFinalize();
  delete session;
  return FAILED;
}

             

           

         
        

Set the runtime configuration by using options. For details, see the Session constructor. The graph execution result will be saved to the output_cov tensor.

Initialize AscendCL resources.

        
                 std::string aclConfigPath = "xx/xx/xx";
    aclError retInit = aclInit(aclConfigPath);
    if (retInit != ACL_ERROR_NONE) {
        ...
        ...
        return FAILED;

Build the graph.

        
             uint32_t graph_id = 0;
ret = session->CompileGraph(graph_id);
if(ret != SUCCESS) {
  ...
  ...
  // Destroy allocations.
  ge::GEFinalize();
  delete session;
  return FAILED;
}

Specify the running device, create a stream, and allocate memory.

        
             // Specify the compute device.
int32_t deviceId = 0;
    retInit = aclrtSetDevice(deviceId);

// Create a stream.
    aclrtStream stream = nullptr;
    aclError aclRet = aclrtCreateStream(&stream);

// Allocate the host memory.
    void* hostPtrA = NULL;
    size_t size = 1024;
    aclRet = aclrtMallocHost(&hostPtrA, size);
// Allocate the device memory.
    void* devPtrB = NULL;
    aclRet = aclrtMalloc(&devPtrB, size, ACL_MEM_MALLOC_HUGE_FIRST);

// Copy the memory and transfer the data from the host to the device.
// hostPtrA indicates the pointer to the source memory address on the host. devPtrB indicates the pointer to the destination memory address on the device. size indicates the memory size.
    aclrtMemcpy(devPtrB, size, hostPtrA, size, ACL_MEMCPY_HOST_TO_DEVICE);

Load the graph to the created stream.

        
             std::map <AscendString, AscendString> options;
uint32_t graph_id = 0;
ret = session->LoadGraph(graph_id, options, stream);
if(ret != SUCCESS) {
  ...
  ...
  // Destroy allocations.
  ge::GEFinalize();
  delete session;
  return FAILED;
}

Run the graph asynchronously and return the execution result.

        
             std::vector<gert::Tensor> input;
std::vector<gert::Tensor> output;
ret = session->ExecuteGraphWithStreamAsync(graph_id, stream, input, output);
// Call aclrtSynchronizeStream to wait for the stream tasks to complete.
    aclRet = aclrtSynchronizeStream(stream);
// Copy the memory and transfer the device data back to the host.
// devPtrA indicates the pointer to the source memory address on the device. hostPtrB indicates the pointer to the destination memory address on the host. size indicates the memory size.
    aclrtMemcpy(hostPtrB, size, devPtrA, size, ACL_MEMCPY_DEVICE_TO_HOST);

Release resources.

        
             // Free memory.
    ret = aclrtFree(devPtrB);
    ret = aclrtFreeHost(hostPtrA);
// Destroy graph allocations.
    ret = ge::GEFinalize();
// Deinitialize AscendCL.
    ret = aclFinalize();

Parent topic: Running a Graph Asynchronously