Description of Command-Line Options
This section describes the command-line options of the HCCL Performance Tester.
Command
- In the MPICH installation scenario:
mpirun [-f <hostfile>] -n <number> ./bin/<executable_file> [-p <npus>] [-b <minbytes>] [-e <maxbytes>] [-f <incfactor>] [-o <operator>] [-r <root>] [-d <datatype>] [-z <0/1>] [-n <iters_count>] [-w <warmup_iters_count>] [-c <0/1>]
- In the Open MPI installation scenario:
mpirun [--prefix <mpi_install_path>] [-hostfile <hostfile>] -n <number> -x <env> [--allow-run-as-root] [--mca <key value>] ./bin/<executable_file> [-p <npus>] [-b <minbytes>] [-e <maxbytes>] [-f <incfactor>] [-o <operator>] [-r <root>] [-d <datatype>] [-z <0/1>] [-n <iters_count>] [-w <warmup_iters_count>] [-c <0/1>]
- mpirun is followed by MPI command-line options. For details, see MPICH Command-Line Options and Open MPI Command-Line Options.
- ./bin/<executable_file> is followed by the command-line options of the HCCL Performance Tester. For details, see HCCL Performance Tester Options.
MPICH Command-Line Options
Only common MPICH options are listed below. For more options, see MPICH official documentation.
|
Option |
Optional/Required |
Description |
|---|---|---|
|
-f <hostfile> |
Optional |
List file of hostfile nodes. Configure this file in multi-server scenarios. You can set this parameter to the absolute path of the hostfile file or the path relative to the directory where the current command is executed. For details about the configuration example, see 4. |
|
-n <number> |
Required |
Total number of NPUs to be started, that is, Number of nodes × Number of NPUs participating in training on each node. |
Open MPI Command-Line Options
Only common Open MPI options are listed below. For more options, see open-mpi documentation.
|
Option |
Optional/Required |
Description |
|---|---|---|
|
--prefix <mpi_install_path> |
Optional |
Installation path of Open MPI. Generally, this parameter is not required in single-server scenarios. In multi-server scenarios, this parameter is required. Otherwise, the MPI library file may fail to be obtained. |
|
-hostfile <hostfile> |
Optional |
List file of hostfile nodes. Configure this file in multi-server scenarios. You can set this parameter to the absolute path of the hostfile file or the path relative to the directory where the current command is executed. For details about the configuration example, see 4. |
|
-n <number> |
Required |
Total number of NPUs to be started, that is, Number of nodes × Number of NPUs participating in training on each node. |
|
-x <env> |
Required |
Name of the environment variable to be transferred to the remote node. |
|
--allow-run-as-root |
Optional |
The mpirun command can be run by root users. |
|
--mca <key value> |
Optional |
The Open MPI is centered on MPI Component Architecture (MCA). You can set mca at mpirun runtime to load various Open MPI components to implement certain features. Common commands:
|
HCCL Performance Tester Options
|
Option |
Optional/Required |
Description |
|---|---|---|
|
./bin/<executable_file> |
Required |
Command of the HCCL Performance Tester. <executable_file> is the executable file of the HCCL Performance Tester, that is, supported test commands.
|
|
Options supported by the collective communication performance test |
||
|
-p <npus> or --npus <npus> |
Optional |
Number of NPUs participating in training on a single compute node. The default value is the total number of NPUs on the current node. If the number of NPUs involved in training on a single compute node is less than the total number of NPUs on the current node, this option is required. Note: The HCCL Performance Tester launches the corresponding devices based on the configured number of NPUs used in training. For details about the configuration restrictions on the parameter, see Restrictions. |
|
-b <minbytes> or --minbytes <minbytes> |
Optional |
Test data size used to perform the collective communication operation.
Notes:
Examples:
|
|
-e <maxbytes> or --maxbytes <maxbytes> |
Optional |
|
|
-i <incsize> or --stepbytes <incsize> |
Optional |
|
|
-f <incfactor> or --stepfactor <incfactor> |
Optional |
|
|
-o <operator> or --op <operator> |
Optional |
Operation type of the Reduce command. The value can be sum, prod, max, or min. The default value is sum. Reduce-related commands include all_reduce_test, reduce_scatter_test, reduce_scatterv_test, and reduce_test.
|
|
-r <root> or --root <root> |
Optional |
When the broadcast_test, reduce_test, or scatter_test command is executed, you can use this option to specify the device ID of the root node. Value range: [0, Actual number of devices – 1] Default value: 0 |
|
-d <datatype> or --datatype <datatype> |
Optional |
Data type supported by the HCCL command. The default type is fp32.
|
|
-z <0/1> or --zero_copy <0/1> |
Optional |
Whether to enable the zero-copy function. In single-operator mode, the input and output buffers dynamically change, and HCCL uses intermediate buffers for data transfer to complete collective communication. However, extra memory copy overhead is introduced. The zero-copy function reduces the memory copy overhead and directly operates the memory transferred by the service to improve the performance. Note: The zero-copy function is for trial use and may be changed in later versions. Therefore, it cannot be used in commercial products.
This option can be set to:
The zero-copy function has the following restrictions:
|
|
Performance test options |
||
|
-n <iters_count> or --iters <iters_count> |
Optional |
Number of iterations. The default value is 20. |
|
-w <warmup_iters_count> or --warmup_iters <warmup_iters_count> |
Optional |
Number of warm-up iterations. This option affects only the execution duration of the HCCL Performance Tester, and is not counted in performance statistics. The default value is 10. Note: Due to the possibility of operations in the first few iterations that may affect the performance test, such as socket establishment operations in the first iteration, you are advised to set the first few iterations as warm-up iterations and not include them in performance statistics. |
|
Result check options |
||
|
-c <0/1> or --check <0/1> |
Optional |
Whether to enable the function of verifying the correctness of the HCCL operation results.
Default value: 1 Note: In large-scale cluster scenarios, enabling result check will increase the execution duration of the HCCL Performance Tester. |