Specifying the Deployment Location of the DataFlow Node

Description

The configuration file can be used to specify the deployment location of the DataFlow graph node. This function can implement multi-device and multi-instance. The calculation amount is prorated to improve the performance.

When the UDF calls the NN, the system implements the following deployment policies:
  • When the UDF is deployed on the host, the NN is deployed on the first device.
  • When the UDF is deployed on the device, the NN is also deployed on the device.

Restrictions

  • You must specify the deployment locations of all nodes.
  • You must specify the deployment configuration file based on ge.experiment.data_flow_deploy_info_path in the options of AddGraph. The file must exist and the format must be correct. For details about the format requirements, see Format of the Configuration File.

    For details about the options, see ""Options"".

  • The deployment location must be consistent with the location that the node can be actually deployed. For example, if the UDF supports only x86, the node cannot be deployed on the device. Similarly, if the node can be deployed only on the device, it cannot be deployed on the host.
  • A node cannot be configured repeatedly.
  • A node cannot be deployed on both the host and device.

Usage

Example:

1
2
3
4
5
6
std::map<ge::AscendString, ge::AscendString> session_options = {};
std::shared_ptr<ge::Session> session = std::make_shared<ge::Session>(session_options);
const auto graph = CreateDataFlowGraph();
std::map<ge::AscendString, ge::AscendString> graph_options = {{"ge.experiment.data_flow_deploy_info_path", "./data_flow_deploy_info.json"}};
auto = session->AddGraph(0, graph, graph_options);
...

Format of the Configuration File

The following is an example of the format requirements of the typical configuration file data_flow_deploy_info.json:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
  
   "keep_logic_device_order": false,
   "batch_deploy_info": [
        {
            "flow_node_list": ["flowNode1", "flowNode2"],
            "logic_device_list": "0:0:-1:0"
        },
        {
            "flow_node_list": ["flowNode3"],
            "logic_device_list": "0:0:1:1"
 
        },
        {
            "flow_node_list": ["flowNode4"],
            "logic_device_list": "0:0:0:0,0:0:1:0",
            "invoke_list":[
            {
               "invoke_name":"invoked_flow_graph_name",
               "deploy_info_file":"./data_flow_invoke_flow_graph_deploy_info.json"
             },
            {
               "invoke_name":"invoked_flow_graph_name1",
               "logic_device_list": "0:0:0:0"
             }]
        },
        {
            "flow_node_list": ["flowNode5", "flowNode6"],
            "logic_device_list": "0:0:2~3:0~1"
        },
        {
            "flow_node_list": ["flowNode7", "flowNode8"],
            "logic_device_list": "0:0:0~1:0,0:0:2:1"
        }
    ]
}
Table 1 Configuration options

Configuration Option

Description

keep_logic_device_order

Whether to deploy instances in the sequence specified by device_list.

The values are as follows:

  • true: Yes. For multiple instances, devices are arranged and set up according to device_list configured by the user.
  • false: No. For multiple instances, devices are arranged and set up according to the internal implementation logic of the framework.

Default value: false

batch_deploy_info

flow_node_list

List of FlowNode node names. One or more node names are supported. Use commas (,) to separate multiple node names.

logic_device_list

Deployment location of the DataFlow graph node.

The format is clusterid:serverid:deviceid:numaid(pgid/dieid). The fields are described as follows:

  • clusterid: cluster ID. Currently, the value is fixed to 0.
  • serverid: server node ID. Set this parameter to the value of node_id in the numa_config.json file. The following is an example:
    1
    2
    3
    4
    5
    6
    7
    {
      "cluster":[
        {
          "cluster_nodes" : [
            {
              "node_id" : 0,
    ..........
    
  • deviceid: logical ID of a device. The value corresponds to the item sequence in item_list in the node which is specified in the numa_config file specified by the environment variable RESOURCE_CONFIG_PATH. The value starts from 0.
  • numaid (pgid/dieid): ID of multiple computing units on a single device.

Typical scenarios are as follows:

If heavy_load of FunctionPp is set to true, the graph node will be deployed on the host node corresponding to the specified node. When the graph node is deployed on multiple devices, perform configuration based on the following rules:
  • Configure the node based on the devices. For example, 0:0:0:0,0:0:0:1 indicates that the node is deployed on devices 0:0:0:0 and 0:0:0:1 as multiple instances.
  • Configure the node based on the device range. For example, 0:0:0~1:0~1, in which the tilde (~) indicates a range, the number before ~ must be less than or equal to that after ~. 0:0:0~1:0~1 indicates that the node is deployed on two computing units of two devices as multiple instances (0:0:0:0, 0:0:0:1, 0:0:1:0 and 0:0:1:1).
NOTE:

If FlowNode is the parent node of the FlowGraph subgraph, multiple instances cannot be configured.

invoke_list (optional. If it is not set, the subgraph deployment node is the same as the parent node deployment node.)

invoke_name

Name of the InvokedClosure nested in the FlowNode, which corresponds to the name parameter of the AddInvokedClosure API.

deploy_info_file

It specifies the deployment location of the DataFlow subgraph node. If the InvokedClosure corresponding to invoke_name is a DataFlow subgraph, you can configure the deployment location file of the DataFlow subgraph, for example, ./data_flow_invoke_flow_graph_deploy_info.json. For details about the fields and format requirements in the file, see data_flow_deploy_info.json.

logic_device_list

It specifies the deployment location of a DataFlow or AscendGraph subgraph node. The configuration method is the same as that of logic_device_list in batch_deploy_info.

For the same subgraph node, logic_device_list and deploy_info_file cannot be configured at the same time. Otherwise, an error is reported.

For an AscendGraph subgraph node, the number of instances in logic_device_list must be the same as that in logic_device_list of the parent node. Otherwise, an error is reported.