--graph_parallel_option_path

Description

Specifies the path and name of the algorithm-based partitioning policy configuration file when the original foundation model is partitioned.

Argument

Argument: Directory (including the name) of the partitioning policy configuration file.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions: The content in the configuration file must be in JSON format.

Suggestions and Benefits

None

Example

atc --distributed_cluster_build=1 --cluster_config=./numa_config_2p.json --model=./matmul2.pb --enable_graph_parallel="1" --graph_parallel_option_path=./parallel_option.json --soc_version=<soc_version>  --output=test_parallel --framework=3 --log=debug

The following is an example of the partitioning policy configuration file:

Semi-automatic partitioning

{
    "graph_parallel_option": {
        "auto": false,
        "opt_level": "O1"
        "tensor_parallel_option": {
            "tensor_parallel_size": 2
        },
        "tensor_sharding":{
          "optimizer_state_sharding": true, 
          "gradient_sharding":true, 
          "model_weight_sharding": true,
          "model_weight_prefetch": true,
          "model_weight_prefetch_buffer_size": 50
		}
    }
}

Automatic partitioning

{
    "graph_parallel_option": {
        "auto": true
    }
}

The parameters are described as follows:

auto: true for automatic partitioning; false for semi-automatic partitioning.
opt_level: Tensor Parallel solution algorithm. The value can be O2 (ILP algorithm) or O1 (DP algorithm). If this parameter is not set, O2 is used by default.
tensor_parallel_option: Enables TP partitioning.
TP partitioning: Tensor Parallel, also called Intra-Op Parallel, partitions the tensor of each operator in a computational graph along one or more axes (batch/non-batch). The obtained partitions are distributed to each device for computation.
tensor_parallel_size: TP size, that is, the number of device processors to be configured. The value of this parameter must be the same as that in the --cluster_config topology file.
optimizer_state_sharding: Enables optimizer sharding. true: enabled; false: disabled.
gradient_sharding: Enables gradient sharding. true: enabled; false: disabled.
model_weight_sharding: Enables weight sharding. true: enabled; false: disabled.
model_weight_prefetch: Enables weight prefetching. true: enabled; false: disabled.
model_weight_prefetch_buffer_size: Specifies the cache size for weight prefetching.

Applicability

Atlas Training Series Product

Dependencies and Restrictions