--graph_parallel_option_path

Description

Specifies the path and name of the algorithm-based partitioning policy configuration file when the original foundation model is partitioned.

See Also

  • The path of the partitioning policy configuration file must be configured and can be configured only after distributed build is enabled by --distributed_cluster_build and the partitioning function is enabled by --enable_graph_parallel. That is, the --graph_parallel_option_path parameter is mandatory.
  • --cluster_config is required in the algorithm-based partitioning scenario.

Argument

Argument: Directory (including the name) of the partitioning policy configuration file.

Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.

Restrictions: The content in the configuration file must be in JSON format.

Suggestions and Benefits

None

Example

atc --distributed_cluster_build=1 --cluster_config=./numa_config_2p.json --model=./matmul2.pb --enable_graph_parallel="1" --graph_parallel_option_path=./parallel_option.json --soc_version=<soc_version>  --output=test_parallel --framework=3 --log=debug

The following is an example of the partitioning policy configuration file:

  • Semi-automatic partitioning
    {
        "graph_parallel_option": {
            "auto": false,
            "opt_level": "O1"
            "tensor_parallel_option": {
                "tensor_parallel_size": 2
            },
            "tensor_sharding":{
              "optimizer_state_sharding": true, 
              "gradient_sharding":true, 
              "model_weight_sharding": true,
              "model_weight_prefetch": true,
              "model_weight_prefetch_buffer_size": 50
    		}
        }
    }
  • Automatic partitioning
    {
        "graph_parallel_option": {
            "auto": true
        }
    }

The parameters are described as follows:

  • auto: true for automatic partitioning; false for semi-automatic partitioning.
  • opt_level: Tensor Parallel solution algorithm. The value can be O2 (ILP algorithm) or O1 (DP algorithm). If this parameter is not set, O2 is used by default.
  • tensor_parallel_option: Enables TP partitioning.

    TP partitioning: Tensor Parallel, also called Intra-Op Parallel, partitions the tensor of each operator in a computational graph along one or more axes (batch/non-batch). The obtained partitions are distributed to each device for computation.

  • tensor_parallel_size: TP size, that is, the number of device processors to be configured. The value of this parameter must be the same as that in the --cluster_config topology file.
  • optimizer_state_sharding: Enables optimizer sharding. true: enabled; false: disabled.
  • gradient_sharding: Enables gradient sharding. true: enabled; false: disabled.
  • model_weight_sharding: Enables weight sharding. true: enabled; false: disabled.
  • model_weight_prefetch: Enables weight prefetching. true: enabled; false: disabled.
  • model_weight_prefetch_buffer_size: Specifies the cache size for weight prefetching.

Applicability

Atlas Training Series Product

Dependencies and Restrictions

None