--graph_parallel_option_path
Description
Specifies the path and name of the algorithm-based partitioning policy configuration file when the original foundation model is partitioned.
See Also
- The path of the partitioning policy configuration file must be configured and can be configured only after distributed build is enabled by --distributed_cluster_build and the partitioning function is enabled by --enable_graph_parallel. That is, the --graph_parallel_option_path parameter is mandatory.
- --cluster_config is required in the algorithm-based partitioning scenario.
Argument
Argument: Directory (including the name) of the partitioning policy configuration file.
Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.
Restrictions: The content in the configuration file must be in JSON format.
Suggestions and Benefits
None
Example
atc --distributed_cluster_build=1 --cluster_config=./numa_config_2p.json --model=./matmul2.pb --enable_graph_parallel="1" --graph_parallel_option_path=./parallel_option.json --soc_version=<soc_version> --output=test_parallel --framework=3 --log=debug
The following is an example of the partitioning policy configuration file:
- Semi-automatic partitioning
{ "graph_parallel_option": { "auto": false, "opt_level": "O1" "tensor_parallel_option": { "tensor_parallel_size": 2 }, "tensor_sharding":{ "optimizer_state_sharding": true, "gradient_sharding":true, "model_weight_sharding": true, "model_weight_prefetch": true, "model_weight_prefetch_buffer_size": 50 } } } - Automatic partitioning
{ "graph_parallel_option": { "auto": true } }
The parameters are described as follows:
- auto: true for automatic partitioning; false for semi-automatic partitioning.
- opt_level: Tensor Parallel solution algorithm. The value can be O2 (ILP algorithm) or O1 (DP algorithm). If this parameter is not set, O2 is used by default.
- tensor_parallel_option: Enables TP partitioning.
TP partitioning: Tensor Parallel, also called Intra-Op Parallel, partitions the tensor of each operator in a computational graph along one or more axes (batch/non-batch). The obtained partitions are distributed to each device for computation.
- tensor_parallel_size: TP size, that is, the number of device processors to be configured. The value of this parameter must be the same as that in the --cluster_config topology file.
- optimizer_state_sharding: Enables optimizer sharding. true: enabled; false: disabled.
- gradient_sharding: Enables gradient sharding. true: enabled; false: disabled.
- model_weight_sharding: Enables weight sharding. true: enabled; false: disabled.
- model_weight_prefetch: Enables weight prefetching. true: enabled; false: disabled.
- model_weight_prefetch_buffer_size: Specifies the cache size for weight prefetching.
Applicability
Dependencies and Restrictions
None