CentOS OS

An NFS allows computers on a network to share resources. In cluster scheduling scenarios, environments with NFSs are required to ensure the normal running of training or inference jobs. An NFS can be installed on a server or client as required.

Installing NFS on a Server

  1. Log in to the storage node as an administrator and run the following command to install the NFS server:
    yum install nfs-utils -y
  2. Fix the NFS-related ports and configure a firewall for the ports as required.
  3. Run the following commands to create a shared directory (for example, /data/atlas_dls) and change the directory permission:
    mkdir -p /data/atlas_dls
    chmod 750 /data/atlas_dls/
  4. Run the vi /etc/exports command to add the content below to the end of the file to configure the allowed IP address as required and harden related permission settings:
    /data/atlas_dls service_IP_address (with necessary permissions)
  5. Run the following commands to start rpcbind:
    systemctl restart rpcbind.service
    systemctl enable rpcbind.service
  6. Run the following command to check whether rpcbind is started:
    systemctl status rpcbind.service

    If the following information is displayed, the service is running properly:

    ● rpcbind.service - RPC bind service
       Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
       Active: active (running) since Fri 2024-01-15 15:54:44 CST; 28s ago
     Main PID: 63008 (rpcbind)
       CGroup: /system.slice/rpcbind.service
               └─63008 /sbin/rpcbind -w
    
    
    Jan 15 15:54:44 centos39 systemd[1]: Starting RPC bind service...
    Jan 15 15:54:44 centos39 systemd[1]: Started RPC bind service.
  7. After rpcbind is started, run the following commands to start the NFS service:
    systemctl restart nfs-server.service 
    systemctl enable nfs-server.service 
  8. Run the following command to check whether the NFS service is started:
    systemctl status nfs-server.service 

    If the following information is displayed, the service is running properly: If the NFS service fails to be started, rectify the fault by referring to Failed to Execute df -h and Failed to Start NFS.

    ● nfs-server.service - NFS server and services
       Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
      Drop-In: /run/systemd/generator/nfs-server.service.d
               └─order-with-mounts.conf
       Active: active (exited) since Fri 2024-01-15 15:56:15 CST; 8s ago
     Main PID: 67145 (code=exited, status=0/SUCCESS)
       CGroup: /system.slice/nfs-server.service
    
    
    Jan 15 15:56:15 centos39 systemd[1]: Starting NFS server and services...
    Jan 15 15:56:15 centos39 systemd[1]: Started NFS server and services.
  9. Run the following command to check the mounting permission of the shared directory (for example, /data/atlas_dls):
    cat /var/lib/nfs/etab

    If the following information is displayed, the service is running properly:

    /data/atlas_dls * (rw, ... displays the configured permission.)

Installing NFS on a Client

  1. Log in to another server as an administrator and run the following command to install the NFS client:
    yum install nfs-utils -y
  2. Run the following commands to start rpcbind:
    systemctl restart rpcbind.service
    systemctl enable rpcbind.service
  3. Run the following command to check whether rpcbind is started:
    systemctl status rpcbind.service

    If the following information is displayed, the service is running properly:

    ● rpcbind.service - RPC Bind
       Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
       Active: active (running) since Thu 2024-03-14 04:59:22 EDT; 8s ago
         Docs: man:rpcbind(8)
     Main PID: 1681425 (rpcbind)
        Tasks: 1 (limit: 3355442)
       Memory: 956.0K
       CGroup: /system.slice/rpcbind.service
               └─1681425 /usr/bin/rpcbind -w -f
    Mar 14 04:59:22 localhost.localdomain systemd[1]: Starting RPC Bind...
    Mar 14 04:59:22 localhost.localdomain systemd[1]: Started RPC Bind.
  4. After rpcbind is started, run the following commands to start the NFS service:
    systemctl restart nfs-server.service 
    systemctl enable nfs-server.service
  5. Run the following command to check whether the NFS service is started:
    systemctl status nfs-server.service
    If the following information is displayed, the service is running properly:
    ● nfs-server.service - NFS server and services
       Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
      Drop-In: /run/systemd/generator/nfs-server.service.d
               └─order-with-mounts.conf
       Active: active (exited) since Thu 2024-03-14 04:59:40 EDT; 8s ago
     Main PID: 1681567 (code=exited, status=0/SUCCESS)
        Tasks: 0 (limit: 3355442)
       Memory: 0B
       CGroup: /system.slice/nfs-server.service
    Mar 14 04:59:39 localhost.localdomain systemd[1]: Starting NFS server and services...
    Mar 14 04:59:39 localhost.localdomain exportfs[1681536]: exportfs: Failed to stat /data/atlas_dls: No such file or directory
    Mar 14 04:59:40 localhost.localdomain systemd[1]: Started NFS server and services.
  6. (Optional) Install the mount command if it does not exist in the client. The NFS requires the mount and umount commands, and the mount command is typically built in within the system.
    yum install -y  util-linux