--distributed_cluster_build
Description
Enables distributed build and partitioning of a foundation model. After this option is enabled, the generated offline model will be used for distributed deployment.
See Also
When this option is set to 1, the following options can be set synchronously:
- The input is a complete foundation model.
If algorithm-based partitioning is enabled for the input foundation model, the following options are mandatory:
Set --cluster_config, enable algorithm-based partitioning using --enable_graph_parallel, and set the path of the partitioning policy configuration file using --graph_parallel_option_path.
In the algorithm-based partitioning scenario, communication operators are automatically inserted in the model build phase.

- The input is a slice model, which contains communication operators. Build the slice model into an .om offline model.
Set --cluster_config, set the directory where multiple slice models are located using --shard_model_dir, and set the input and output relationships between these slice models using --model_relation_config.

Argument
- 1: Enables distributed build and partitioning.
- 0 (default): Disables distributed build and partitioning.
Suggestions and Benefits
None
Example
- Distributed model deployment, with a single foundation model as the input (no algorithm-based partitioning, no specified target device, and model deployed to all devices in load balancing mode).
atc --model=xxx.air --framework=1 --soc_version=<soc_version> --output=xxx --cluster_config=./numa_config.json --distributed_cluster_build=1 - Distributed model deployment, with a single foundation model as the input and algorithm-based partitioning enabled.
atc --model=./matmul2.pb --distributed_cluster_build=1 --cluster_config=./numa_config_2p.json --enable_graph_parallel="1" --graph_parallel_option_path=./parallel_option.json --soc_version=<soc_version> --output=test_parallel --framework=3 --log=debugAfter algorithm-based partitioning of a foundation model, IDs of the logic devices on which the submodels are to be deployed are stored in the submodels' attributes. During reloading and redeploying, the deployment module performs distributed deployment based on the attributes.
- Distributed model deployment, with slice models (containing communication operators) as the input (the shared_dir directory contains multiple slice models).
atc --distributed_cluster_build=1 --cluster_config=../numa_config_4p.json --shard_model_dir=../1_air --model_relation_config=./model_relation_config.json --output=1_increase_4p --framework=1 --log=debug --soc_version=<soc_version>
Applicability
Restrictions
When performing inference on an offline model built in the distributed build and partitioning scenario, the aclmdlLoadFromFile API in AscendCL cannot be used to load the model. The LoadGraph API must be used in the Ascend IR graph to load the model in session mode. RunGraph is called to run the graph that loads the model to obtain the graph execution result. For details, see "Compiling a Graph to an Offline Model and Running the Graph (Distributed Compilation and Partitioning of Foundation Models)".