--model_relation_config
Description
Sets the configuration file that expresses data associations and distributed communication group relationships between multiple slice models. It applies to scenarios where the original foundation model is a slice model that contains communication operators.
See Also
- This option takes effect only after distributed build is enabled by --distributed_cluster_build, and must be used together with --shard_model_dir to specify the path of the slice model.
- --cluster_config is required when the model contains communication operators.
Argument
Argument: Directory of the configuration file, including the file name.
Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.
Restrictions: The content in the configuration file must be in JSON format.
Suggestions and Benefits
None
Example
Upload the configuration file to any directory (for example, $HOME/conf) on the server where ATC is located. The following is an example:
atc --distributed_cluster_build=1 --cluster_config=$HOME/conf/numa_config_4p.json --output=1_increase_4p --framework=1 --log=debug --shard_model_dir=../1_air --model_relation_config=$HOME/conf/model_relation_config.json --soc_version=<soc_version>
The following is an example of the configuration file. For a model after TP partitioning, its configuration file contains only the deploy_config node.
{
"deploy_config" :[ // (Required) mapping between the model to be deployed and the target deployment node.
{
"submodel_name":"submodel1.air", // File name after frontend partitioning, which must be the same as the model name after frontend partitioning in --shard_model_dir.
"deploy_device_id_list":"0:0:0" // Target device for the model to be deployed: cluster:0 node:0 item:0
},
{
"submodel_name":"submodel2.air",
"deploy_device_id_list":"0:0:1"
}
],
"model_name_to_instance_id":[ // Required
{
"submodel_name":"submodel1.air", // Model ID, which is user-defined in the file. Different files correspond to different IDs.
"model_instance_id":0
},
{
"submodel_name":"submodel2.air",
"model_instance_id":1
}
],
"comm_group":[{ // (Optional) If the model partitioned at the frontend contains communication operators, this parameter indicates the communication domain information of the model communication operators after partitioning.
"group_name":"tp_group_name_0", // Sub-communication domain of communication operators of the model partitioned at the frontend.
"group_rank_list":"[0,1]" // Subrank list of communication operators of the model partitioned at the frontend.
}],
"rank_table":[
{
"rank_id":0, // Mapping between rank IDs and model IDs.
"model_instance_id":0
},
{
"rank_id":1,
"model_instance_id":1
}
]
}
Applicability
Dependencies and Restrictions
None