--distributed_cluster_build

Description

Enables distributed build and partitioning of a foundation model. After this option is enabled, the generated offline model will be used for distributed deployment.

Argument

1: Enables distributed build and partitioning.
0 (default): Disables distributed build and partitioning.

Suggestions and Benefits

None

Example

Distributed model deployment, with a single foundation model as the input (no algorithm-based partitioning, no specified target device, and model deployed to all devices in load balancing mode).
```
atc --model=xxx.air --framework=1 --soc_version=<soc_version> --output=xxx --cluster_config=./numa_config.json --distributed_cluster_build=1
```
Distributed model deployment, with a single foundation model as the input and algorithm-based partitioning enabled.
```
atc --model=./matmul2.pb --distributed_cluster_build=1 --cluster_config=./numa_config_2p.json --enable_graph_parallel="1" --graph_parallel_option_path=./parallel_option.json --soc_version=<soc_version>  --output=test_parallel --framework=3 --log=debug
```
After algorithm-based partitioning of a foundation model, IDs of the logic devices on which the submodels are to be deployed are stored in the submodels' attributes. During reloading and redeploying, the deployment module performs distributed deployment based on the attributes.

Distributed model deployment, with slice models (containing communication operators) as the input (the shared_dir directory contains multiple slice models).

atc --distributed_cluster_build=1 --cluster_config=../numa_config_4p.json --shard_model_dir=../1_air --model_relation_config=./model_relation_config.json --output=1_increase_4p --framework=1 --log=debug   --soc_version=<soc_version>

Applicability

Atlas Training Series Product

Restrictions

When performing inference on an offline model built in the distributed build and partitioning scenario, the aclmdlLoadFromFile API in AscendCL cannot be used to load the model. The LoadGraph API must be used in the Ascend IR graph to load the model in session mode. RunGraph is called to run the graph that loads the model to obtain the graph execution result. For details, see "Compiling a Graph to an Offline Model and Running the Graph (Distributed Compilation and Partitioning of Foundation Models)".

Parent topic: Input Options