Installing and Configuring MPI
The HCCL Performance Tester launches processes using the Message Passing Interface (MPI), so you need to install the MPI package first.
The following operations need to be performed on each machine that participates in collective communication.
- If the communication NIC only uses the IPv4 protocol:
- If the communication NIC uses the IPv6 protocol, install Open MPI 4.1.5.
The following describes how to install MPICH and Open MPI. Note that the following operations must be performed on each machine that participates in collective communication.
Installing and Configuring MPICH
- Install the MPICH package.
- Download and decompress the MPICH package.
- For the
Atlas Training Series Product , download MPICH 3.2.1.
Run the following command to decompress the obtained mpich-${version}.tar.gz package.
tar -zxvf mpich-${version}.tar.gz${version} indicates the MPICH version.
- For the
- Go to the directory where the MPICH package is decompressed and configure compilation options.
cd mpich-${version} ./configure --disable-fortran --prefix=/usr/local/mpich- --disable-fortran: disables the Fortran language.
- --prefix: indicates the MPI installation path, which can be customized.
- Compile and install MPICH.
make && make install
After the preceding command is executed, MPICH is installed in /usr/local/mpich.
- Download and decompress the MPICH package.
- Configure network node information.
Add the IP address in the operating environment to the /etc/hosts file, in the format of {IP address} {Host name}, as shown below.
172.16.0.100 node3
node3 indicates the host name, which can be obtained by running the hostname command.
If the Euler OS is used, run the following command for the updated /etc/hosts file to take effect.
nmcli c reload
- Configure the SSH trust relationship between the current operation node and communication nodes in the cluster, to support remote login to communication nodes in the cluster.
The following is an example:
- Generate the key information on the current operation node. (If the key information already exists in the environment, skip this step.)
ssh-keygen -t rsa
For example, the key information is generated and stored in the /root/.ssh/id_rsa.pub file.
- Copy the public key of the operation node to other communication nodes in the cluster, to log in to the remote host using the SSH key.
Refer to the code snippet below. ${nodeX_ip_address} indicates the IP address of the node that needs to communicate with the operation node.
ssh-copy-id -i /root/.ssh/id_rsa.pub ${node3_ip_address} ssh-copy-id -i /root/.ssh/id_rsa.pub ${node4_ip_address} - Log in remotely to the node with the SSH trust relationship and check whether the login is successful.
- Generate the key information on the current operation node. (If the key information already exists in the environment, skip this step.)
Installing and Configuring Open MPI
- Download and decompress the Open MPI package.
- Edit the configuration file of the Open MPI source code and modify the maximum number of hosts supported by Open MPI.
- Go to the directory where the Open MPI source code is stored.
cd openmpi-4.1.5
- Modify the orte/mca/routed/radix/routed_radix_component.c configuration file.
vi orte/mca/routed/radix/routed_radix_component.c
Change the value of mca_routed_radix_component.radix to Total number of NICs in the cluster/Number of NICs on a single server. For example:
mca_routed_radix_component.radix = 1024;
Save the configuration and exit.
- Modify the orte/mca/plm/rsh/plm_rsh_component.c configuration file.
vi orte/mca/plm/rsh/plm_rsh_component.c
Change the value of mca_plm_rsh_component.num_concurrent to Total number of NICs in the cluster/Number of NICs on a single server. For example:
mca_plm_rsh_component.num_concurrent = 1024;
Save the configuration and exit.
- Go to the directory where the Open MPI source code is stored.
- Configure compilation options.
./configure --disable-fortran --enable-ipv6 --prefix=/usr/local/openmpi
- --disable-fortran: disables the Fortran language.
- --enable-ipv6: enables IPv6.
- --prefix: indicates the Open MPI installation path, which can be customized.
- Compile and install Open MPI.
make && make install
After the preceding command is executed, Open MPI is installed in /usr/local/openmpi.
- Configure network node information.
Add the IP address in the operating environment to the /etc/hosts file, in the format of{IP address} {Host name} (the host name can be obtained by running the hostname command), as shown below.
172.16.0.100 node1 172.16.1.200 node2 fec0::b6ef:69dc:337d:9a12 node3 fec0::b6ef:998f:f3eb:4617 node4
If the Euler OS is used, run the following command for the updated /etc/hosts file to take effect.
nmcli c reload
- Configure the SSH trust relationship between the current operation node and communication nodes in the cluster, to support remote login to communication nodes in the cluster.
The following is an example:
- Generate the key information on the current operation node. (If the key information already exists in the environment, skip this step.)
ssh-keygen -t rsa
For example, the key information is generated and stored in the /root/.ssh/id_rsa.pub file.
- Copy the public key of the operation node to other communication nodes in the cluster, to log in to the remote host using the SSH key.
- If the communication NIC uses the IPv4 address, run the following command to copy the public key.
ssh-copy-id -i /root/.ssh/id_rsa.pub ${node1_ipv4_address} ssh-copy-id -i /root/.ssh/id_rsa.pub ${node2_ipv4_address}For example:
ssh-copy-id -i /root/.ssh/id_rsa.pub 172.16.0.100
- If the communication NIC uses the IPv6 address, run the following command to copy the public key.
ssh-copy-id -i /root/.ssh/id_rsa.pub ${node3_ipv6_address}% NIC name ssh-copy-id -i /root/.ssh/id_rsa.pub ${node4_ipv6_address}% NIC nameFor example:
ssh-copy-id -i /root/.ssh/id_rsa.pub fec0::b6ef:998f:f3eb:4617%enp189s0f0
- If the communication NIC uses the IPv4 address, run the following command to copy the public key.
- Log in remotely to the node with the SSH trust relationship in step 6 and check whether the login is successful.
- Generate the key information on the current operation node. (If the key information already exists in the environment, skip this step.)
- Configure MPICH startup options. Perform this step only when the communication NIC uses the IPv6 protocol. Skip this step if the communication NIC uses the IPv4 protocol.
export HYDRA_LAUNCHER_EXTRA_ARGS="-B {IPv6 NIC name of the node}"