Description of Command-Line Options
This section describes the command-line options of the HCCL Performance Tester.
Command
- In the MPICH installation scenario:
mpirun [-f hostfile] [-n number] ./bin/<executable_file> [-p npus] [-b minbytes] [-e maxbytes] [-f stepfactor] [-o operator] [-r root] [-d datatype] [-n iters] [-w warmup_iters] [-c <0/1>]
- In the Open MPI installation scenario:
mpirun [-hostfile hostfile] [-n number] [-x environment_variable_name] [--allow-run-as-root] [--mca key value] ./bin/<executable_file> [-p npus] [-b minbytes] [-e maxbytes] [-f stepfactor] [-o operator] [-r root] [-d datatype] [-n iters] [-w warmup_iters] [-c <0/1>]
- mpirun is followed by MPI options.
- ./bin/<executable_file> is followed by the options of the HCCL Performance Tester.
MPICH Command-Line Options
Only common MPICH options are listed below. For more options, run mpirun --help to view help information.
Option |
Optional/Required |
Description |
|---|---|---|
-f |
Optional |
List file of hostfile nodes. Configure this file in multi-server scenarios. For details about the configuration example, see 3. |
-n |
Required |
Total number of NPUs to be started, that is, Number of nodes × Number of NPUs participating in training on each node. |
Open MPI Command-Line Options
Only common Open MPI options are listed below. For more options, see open-mpi documentation.
Option |
Optional/Required |
Description |
|---|---|---|
-hostfile |
Optional |
List file of hostfile nodes. Configure this file in multi-server scenarios. For details about the configuration example, see 3. |
-n |
Required |
Total number of NPUs to be started, that is, Number of nodes × Number of NPUs participating in training on each node. |
-x |
Required |
Name of the environment variables to be passed to remote nodes. The variables are the ones (except PATH) that are configured before the HCCL test command is executed. For details about the environment variable configuration, see Execution. |
--allow-run-as-root |
Optional |
The mpirun command can be run by root users. |
--mca |
Optional |
The Open MPI is centered on MPI Component Architecture (MCA). You can set mca at mpirun runtime to load various Open MPI components to implement certain features. Common commands:
|
HCCL Performance Tester Options
Option |
Optional/Required |
Description |
|---|---|---|
./bin/<executable_file> |
Required |
Command of the HCCL Performance Tester. <executable_file> is the executable file of the HCCL Performance Tester. Currently, the following files can be specified: all_gather_test, all_reduce_test, alltoallv_test, alltoall_test, broadcast_test, reduce_scatter_test, reduce_test, scatter_test. For example, ./bin/all_gather_test. |
Options supported by the collective communication performance test |
||
-p |
Optional |
Number of NPUs participating in training on a single compute node. The default value is the total number of NPUs on the current node. Note: The HCCL Performance Tester launches the corresponding devices based on the configured number of NPUs used in training. For details about the restrictions on the devices, see Restrictions. |
-b |
Optional |
Test data size used to perform the collective communication operation.
Notes:
Examples:
|
-e |
Optional |
|
-i |
Optional |
|
-f |
Optional |
|
Collective communication operation options |
||
-o |
Optional |
Operation type of the Reduce command. The value can be sum, prod, max, or min. The default value is sum. Reduce-related commands include all_reduce_test, reduce_scatter_test, and reduce_test. |
-r |
Optional |
When the broadcast_test, reduce_test, or scatter_test command is executed, you can use this option to specify the device ID of the root node. Value range: [0, Actual number of devices – 1] Default value: 0 |
-d |
Optional |
Data type supported by the HCCL command. The default type is fp32.
|
Performance test options |
||
-n |
Optional |
Number of iterations. The default value is 20. |
-w |
Optional |
Number of warm-up iterations. This option affects only the execution duration of the HCCL Performance Tester, and is not counted in performance statistics. The default value is 5. Note: Due to the possibility of operations in the first few iterations that may affect the performance test, such as socket establishment operations in the first iteration, you are advised to set the first few iterations as warm-up iterations and not include them in performance statistics. |
Result check options |
||
-c |
Optional |
Whether to enable the function of verifying the correctness of the HCCL operation results.
Default value: 1 Note: In large-scale cluster scenarios, enabling result check will increase the execution duration of the HCCL Performance Tester. |