--model_relation_config

Description

Sets the configuration file that expresses data associations and distributed communication group relationships between multiple slice models. It applies to scenarios where the original foundation model is a slice model that contains communication operators.

See Also

Argument

Argument: Directory of the configuration file, including the file name.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions: The content in the configuration file must be in JSON format.

Suggestions and Benefits

None

Example

Upload the configuration file to any directory (for example, $HOME/conf) on the server where ATC is located. The following is an example:

atc --distributed_cluster_build=1 --cluster_config=$HOME/conf/numa_config_4p.json --output=1_increase_4p --framework=1 --log=debug --shard_model_dir=../1_air  --model_relation_config=$HOME/conf/model_relation_config.json  --soc_version=<soc_version>

The following is an example of the configuration file. For a model after TP partitioning, its configuration file contains only the deploy_config node.

{
  "deploy_config" :[                    // (Required) mapping between the model to be deployed and the target deployment node.
    {
    "submodel_name":"submodel1.air",  // File name after frontend partitioning, which must be the same as the model name after frontend partitioning in --shard_model_dir.
      "deploy_device_id_list":"0:0:0"   // Target device for the model to be deployed: cluster:0 node:0 item:0
    },
    {
      "submodel_name":"submodel2.air",
      "deploy_device_id_list":"0:0:1"
    }
  ],
  "model_name_to_instance_id":[          // Required
    {
      "submodel_name":"submodel1.air",   // Model ID, which is user-defined in the file. Different files correspond to different IDs.
      "model_instance_id":0
    },
    {
      "submodel_name":"submodel2.air",
      "model_instance_id":1
    }
  ],
  "comm_group":[{                      // (Optional) If the model partitioned at the frontend contains communication operators, this parameter indicates the communication domain information of the model communication operators after partitioning.
    "group_name":"tp_group_name_0",    // Sub-communication domain of communication operators of the model partitioned at the frontend.
    "group_rank_list":"[0,1]"          // Subrank list of communication operators of the model partitioned at the frontend.
  }],
  "rank_table":[
  {
    "rank_id":0,                      // Mapping between rank IDs and model IDs.
    "model_instance_id":0
  },
  {
    "rank_id":1,
    "model_instance_id":1
  }
  ]
}

Applicability

Atlas Training Series Product

Dependencies and Restrictions

None