Graph Build to an Offline Model and Graph Run (Distributed Build and Partitioning of Foundation Models)

This section consists of two parts. Compile an OM offline model that can be used for distributed deployment, use LoadGraph to load the model, and use RunGraph to run the graph that loads the model to obtain the graph execution result.

Building an Offline Model for Distributed Deployment

In addition to the method described in this section, you can also use the ATC tool to convert an offline model for distributed deployment. During model conversion, you need to enable the --distributed_cluster_build option.

The following figure illustrates the API call sequence.

  1. aclgrphBuildInitialize: initializes the system and allocates resources after a graph is defined. During initialization, the CLUSTER_CONFIG parameter needs to be configured through options of aclgrphBuildInitialize to specify the logical topology of the target deployment environment when building and partitioning foundation models.
  2. aclgrphBuildModel: builds a graph into an offline model adapted to the Ascend AI Processor. The TBE built-in OPP and custom OPP are loaded at build time. The model is stored in the memory buffer.
    The DISTRIBUTED_CLUSTER_BUILD parameter needs to be configured through options of aclgrphBuildModel when building and partitioning foundation models. After DISTRIBUTED_CLUSTER_BUILD is enabled, the generated offline model is used for distributed deployment.
    • If the input is a complete foundation model and algorithm-based partitioning is enabled:

      The ENABLE_GRAPH_PARALLEL and GRAPH_PARALLEL_OPTION_PATH parameters are configured through options of aclgrphBuildModel. The ENABLE_GRAPH_PARALLEL parameter enables algorithm-based partitioning, and the GRAPH_PARALLEL_OPTION_PATH parameter specifies the path of the partitioning policy configuration file.

      After algorithm-based partitioning is performed on foundation models, the ID of the logical device to be deployed in the submodel is stored in the attribute of the submodel. After the submodel is reloaded and deployed, distributed deployment of modules is implemented.

      Figure 1 Schematic diagram
    • The input is a slice model, which contains communication operators. Build the slice model into an .om offline model.

      The aclgrphBuildModel API uses options to configure MODEL_RELATION_CONFIG (sets the input and output relationships between multiple slice models).

      Figure 2 Schematic diagram
  3. aclgrphSaveModel: serializes the offline model in the memory buffer to an .om file.
  4. aclgrphBuildFinalize: ends the process and destroys allocations.

Loading a Model and Running a Graph

The following figure illustrates the API call sequence.

  1. GEInitialize: initializes the system and allocates resources. (This API can be called before graph construction.)
  2. Session constructor: creates a Session object and allocates Session resources.
  3. LoadGraph: adds the OM offline model built through Building an Offline Model for Distributed Deployment or the offline model that is converted by the ATC tool and can be used for distributed deployment to the session class object.
  4. RunGraph: runs the graph.
  5. GEFinalize: destroys system allocations.

Example

  1. Include the header file.
    1
    #include "ge_api.h"
    
  2. Allocate system resources.

    After a graph is defined, call GEInitialize to initialize the system (or call it before defining a graph) and allocate system resources. See the following code snippet.

    1
    2
    3
    std::map<AscendString, AscendString>config = {{"ge.exec.deviceId", "0"},
                                                  {"ge.graphRunMode", "1"}};
    Status ret = ge::GEInitialize(config);
    

    Set the GE initialization configuration by using config. Configure ge.exec.deviceId and ge.graphRunMode in the sample: one for specifying the device where a GE instance runs, and the other for specifying the graph run mode (set to 0 for online inference and 1 for training). For more configurations, see Table 1.

  3. Add a graph object and load and execute an offline model.
    To run a defined graph, you need to create a session object, and then call the LoadGraph API to load and execute the offline model. See the following code snippet.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    std::map <AscendString, AscendString> options;
    ge::Session *session = new Session(options);
    if(session == nullptr) {
      std::cout << "Create session failed." << std::endl;
      ...
      ...
      // Destroy allocations.
      ge::GEFinalize(); 
      return FAILED;
    }
    
    auto session = std::make_shared<ge::Session>(options);
    uint32_t graph_id = 0;
    const std::string model_path;
    std::map<std::string, std::string> load_option;
    // Only offline models that can be used for distributed deployment can be loaded, and the models do not contain variables.
    auto ret = session->LoadGraph(graph_id, load_option, model_path);
    
    // Execute the model.
    std::vector<ge::Tensor> inputs;
    std::vector<ge::Tensor> outputs;
    ret = session->RunGraph(graph_id, inputs, outputs);
    if(ret != SUCCESS) {
      ...
      ...
      // Destroy allocations.
      ge::GEFinalize();
      delete session;
      return FAILED;
    }
    

    Set the run configuration by using options. For details, see the Session constructor. The graph execution result will be saved to the output_cov tensor.

  4. Call GEFinalize to destroy the allocations.
    1
    ret = ge::GEFinalize();