NFS Performance Checking and Optimizing

Mounting parameters
1. soft/hard
  hard: If the NFS service of the server becomes invalid after the client is successfully mounted, the client attempts to resend the service request infinitely. Therefore, programs that access the file system, such as running the cd, ls, or df command, are suspended and services are not responded. After the NFS service is restarted on the server, the client waits the returned operation result for a period of time.
  
  soft: When soft is used for mounting, an error is returned after the client attempts to resend the request in limited times.
  
  Note: If this parameter is not set, hard is used by default.
2. timeo/retrans
  timeo: specifies the waiting time before the request is retransmitted, and retrans specifies the number of retransmission times after each request fails. Both parameters are used to control the behavior of the client after the request times out. This parameter is valid only when retrans and soft are used together.
3. wsize
  wsize: If this parameter is not set, the value is negotiated between the client and server.
  
  Note: Generally, the default value is used. The value is negotiated between the client and server. For a congested low-speed network, you can reduce the value of this parameter so that smaller request packets are sent to the server to improve NFS performance. For a high-speed network, you can increase the value of this parameter to reduce the number of request packets sent to the server and improve performance.
4. rsize
  rsize: If this parameter is not set, the value is negotiated between the client and server.
  
  Note: Generally, the default value is used. The value is negotiated between the client and server. For a congested low-speed network, you can reduce the value of this parameter so that smaller request packets are sent to the server to improve NFS performance. For a high-speed network, you can increase the value of this parameter to reduce the number of request packets sent to the server and improve performance.
5. ac/noac
  ac/noac: To improve performance, the NFS client caches file attributes (ac by default), checks the file attributes periodically, and updates the file attributes. When the ac parameter is used to cache file attributes, the acregmin, acregmax, acdirmin, acdirmax, or actimeo parameter can also be used. acregmin/acregmax: specifies the minimum and maximum duration (in seconds) for caching common file attributes on the NFS client. After the duration expires, the attributes are updated. By default, the minimum duration is 3s and the maximum duration is 60s.
  
  acdirmin/acdirmax: specifies the minimum and maximum duration (in seconds) for setting the attributes of the cache directory on the NFS client. After the duration expires, the attributes are updated. By default, the minimum duration is 3s and the maximum duration is 60s.
  
  actimeo: sets acregmin, acregmax, acdirmin, and acdirmax to the same value, in seconds.
  
  Note: When the attributes of the shared file on the server are frequently changed by multiple clients, you are advised to use the noac option, or ac with a smaller acregmin, acregmax, acdirmin, or acdirmax value to achieve better attribute consistency. If the attributes of the shared files on the server are not frequently changed, for example, the file sharing is read-only or the network delivers good performance, you are advised to use the default ac option and set acregmin, acregmax, acdirmin, or acdirmax based on the actual network status.
6. sharecache/nosharecache
  sharecache/nosharecache: When a client uses different local directories to mount the same NFS share and sharecache is used during mounting, the client shares the NFS data cache.
  
  nosharecache causes multiple cache copies for a file, and leads to data inconsistency.
  
  Note: This parameter is used when a client mounts the same shared directory for multiple times. You are advised to use the default sharecache option.
7. lookupcache=mode
  lookupcache=mode: The value of mode can be all, none, or pos. all caches both results that file exists and the file does not exist. Therefore, when the same file is queried for the second time, the LOOKUP command word is not sent again because the client has cached the result.
  
  This option can quickly detect files created or deleted by other clients, but affects server performance.
  
  Note: The LOOKUP command word is used to convert a file name into a file handle. If multiple clients frequently create or delete files, none is recommended. In other cases, all or pos is recommended.
8. cto/nocto
  cto/nocto: Linux implements the cache consistency feature "close-to-open" by comparing the GETATTR query results when a file is closed and opened next time (http://nfs.sourceforge.net/). If the query results are the same, the cache data on the client is still valid. Otherwise, the cache data should be cleared.
  
  Using cto to mount and read the same file: Before the file is read for the second time, the client sends GETATTR to obtain the attribute and compares it with the cached result and finds that the file does not change. Therefore, the client does not send READ for the second time and directly reads the file content from its cache. If the file content is changed on the server before the file is read for the second time, the client sends GETATTR and compares the result with the cached result. As the change is detected, the client sends READ for the second time to read the file content from the server. cto ensures data consistency.
  
  Using nocto to mount and read the same file: Before the file is read for the second time, the client does not send GETATTR and directly reads the file content from its cache. If the file content is changed on the server before the file is read for the second time, the client does not detect the change. Therefore, the client reads outdated data for the second time. After a period of time, the client sends a READ message to the server to read the file again and obtain the new file content.
  
  Note: If the file content seldom changes, for example, the server provides the read-only share permission (the file system is exported with the read-only permission) for customers, you are advised to use the nocto option to improve performance. If the file content changes frequently and the client has high requirements on file cache consistency, you are advised to use the cto option.
9. tcp/udp
  tcp/udp: The TCP protocol ensures the stability, correctness, and reliability of the transmission. The UDP protocol is faster and provides higher response performance for the client.
  
  Note: In an unstable and complex network environment, you are advised to use tcp. In a stable network environment, you can use udp. NFSv3/NFSv4 supports tcp/udp, while NFSv2 supports only udp.
NFS performance tuning options
1. Protocol options
  Maximum transport block
  
  Specifies the size of an NFS packet block, that is, the maximum payload that can be carried in a packet sent by a protocol client. The default value is 1 MB (1,048,576 bytes), which is also the maximum value.
  
  Run the echo 1048576 > /proc/fs/nfsd/max_block_size command to configure this parameter. This parameter does not take effect after the device is restarted.
  
  Number of communication threads
  
  Sets the number of protocol communication threads. Increase the value of this parameter when the processing capability of the communication thread is insufficient.
  
  Run the echo 32 > /proc/fs/nfsd/threads command to configure this parameter.
  
  To check whether the number of communication threads is insufficient, use the netstat tool to check the stacking of the buffer for sending and receiving packets on the NFS connection. Pay attention to the TCP (used by default) status. Port 2049 is the NFS service port.
  
  Number of service threads
  
  Sets the number of protocol service threads. Increase the value of this parameter when the processing capability of the service thread is insufficient.
  
  Run the echo 32 > /proc/fs/nfsd/pool_threads command to configure this parameter.
2. TCP/IP options
  Linux
```
net.ipv4.tcp_rmem = 10000000 20000000 40000000
```
  Size of the TCP receive buffer, which is the default value of the TCP receive window. For a 10GE NIC, you are advised to change the value to 10 MB, 20 MB, or 40 MB.
```
echo 10000000 20000000 40000000 > /proc/sys/net/ipv4/tcp_rmem
net.ipv4.tcp_wmem = 10000000 20000000 40000000
echo 10000000 20000000 40000000 > /proc/sys/net/ipv4/tcp_wmem
```
  Size of the TCP send buffer. For a 10GE NIC, you are advised to change the value to 10 MB, 20 MB, or 40 MB.
```
net.ipv4.tcp_mem = 400000  800000  1600000
```
  Number of memory pages that can be used by the TCP protocol stack. The size of each page is 4 KB. You are advised to change the value to 400 MB, 800 MB, or 1.6 GB.
```
net.ipv4.tcp_window_scaling = 1
```
  The scaling factor option of the TCP window. The TCP window larger than 64 KB is supported only after this option is enabled.
  
  Note: The values of tcp_rmem and tcp_wmem need to be changed only on the client.
3. NIC options
  Linux
  1. 1: Support jumbo frames.
```
ip link set all xxx mtu 9000
ifconfig xxx mtu 9000
```
  2. Enable flow control.
```
ethtool –A xxx rx on
ethtool –A xxx tx on
```
  3. Interrupt aggregation.
```
ethtool –C xxx rx-usecs 32
```
4. System options
  Linux
  1. Number of concurrent RPC requests.
```
echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
```
    Note: After the value is changed, unmount the file system and remount it for the change to take effect.
Performance problems location and analysis
1. Latency analysis
  The following figure shows the layers of the NFS application. The left part indicates the client side, and the right part indicates the server side.
  
  Numbers 1 to 8 have corresponding delay statistics. You can obtain the delay between two adjacent points to accurately locate the time-consuming layer.
  
  Application layer statistics
  
  Application latency statistics (such as Vdbench) output corresponding information such as latency, rate, and OPS.
  
  Client NFS/SunRPC statistics
  1. nfsiostat
    The output is real-time statistics, but only the read and write command words can be collected.
```
- op/s
This is the number of operations per second.
- rpc bklog
This is the length of the backlog queue.
- kB/s
This is the number of KB written/read per second.
- kB/op
This is the number of KB written/read per operation.
- retrans
This is the number of retransmissions.
- avg RTT (ms)
This is the duration between the time when client's kernel sends the RPC request and when it receives the reply.
- avg exe (ms)
This is the duration between the time when NFS client sends the RPC request to its kernel and when the request is completed. It includes the RTT time above.
```
  2. mountstats
    The output is the accumulated statistics value, which can be used to collect statistics on all command words. The command output contains detailed description. Pay attention to the backlog wait, RTT, and total execute time corresponding to the command word statistics.
  3. /proc/1/mountstats
    Source data of the nfsiostat and mountstats commands. Pay attention to the statistics of Xprt and statistics corresponding to commands.
    1. Xprt statistics
```
xprt:   tcp 734 0 1 0 0 2173669 2173156 0 904604335 0 10 1165672 2493580
l  1. srcport:         Ephemeral port
l  2. bind_count:      Number of rpcbind operations
l  3. connect_count:   Number of TCP connects
l  4. connect_time:    Time taken by connects
l  5. idle_time:       Transport idle duration.
l  6. rpcsends:        Number of sent sockets
l  7. rpcrecvs:        Number of received sockets
l  8. badxids:         Number of unmatchable XIDs received
l  9. req_u:      Average requests on the wire (slot table utilization)
l  10. bklog_u:        Backlog queue utilization (average length of backlog queue)
l  11. max_slots:      Max rpc_slots used
l  12. sending_u:      Send q utilization
l  13. pending_u:      Pend q utilization
```
      Items 10 and 6: indicate the average number of queuing requests.
    2. per-op statistics
```
READ: 276305 276305 0 54882824 197516342140 16993798 857262 18130986
l  1.operations: Number of requests done for the operation
l  2.transmissions: Number of times that an RPC request is actually transmitted for the operation. As you may have collected from the last entry, this can exceed the operation count due to timeouts and retries.
l  3.major timeouts: Number of times that a request has a major timeout. The "nfs: server X not responding, still trying" message is displayed upon major timeouts. Timeouts and retries can exist without major timeouts, similar to the example lines.
l  4.bytes sent: It includes not only the RPC payload but also the RPC headers and closely matches the on-the-wire size.
l  5.bytes received: The same as bytes sent, it is the full size.
l  6.cumulative queue time: Time taken (in milliseconds) by all requests to queue for transmission before they are sent.
l  7.cumulative response time: Time taken (in milliseconds) to get a reply back after the request is transmitted. The kernel comments call this the RPC RTT.
l  8.cumulative total request time: Time taken (in milliseconds) by all requests to queue, which starts at the initial queue and ends after all requests are handled. The kernel calls this the RPC execution time.
```
      Item 8 and 1: indicate the average processing delay in total.
      
      Item 7 and 1: indicate the average delay between the time when the client sends a request and when it receives a response.
      
      Item 6 and 1: indicates the average queuing delay of the client.
    Client ETH statistics
    
    Use tcpdump to capture packets, and then use Wireshark to analyze the packets.
    1. On the Wireshark page, choose Statistics > Service Response Time > ONC-RPC.
    2. Select NFS in the Program drop-down list, set Version to 3 (NFSv3), and click Create Stat.
    3. Check the statistics. The last column indicates the average delay, in seconds.
2. Concurrent analysis
  The Linux NFS client controls the number of concurrent NFS requests. A small value of this parameter deteriorates the I/O performance. /proc/sys/sunrpc/tcp_slot_table_entries: The default value is 2 on Ubuntu 18.04.
  
  Run the echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries command to configure this parameter.
  
  Note: After the configuration is complete, mount the NFS share again.
3. Network analysis
  netstat
  1. netstat -nap
    Pay attention to the second and third columns, which indicate the receiving queue and sending queue respectively. The unit is byte. In the long term, if its size equals the TCP buffer size, the lower layer may be faulty. In this case, locate the fault of the system or NIC module.
  2. netstat -s
    Pay attention to the statistics of the Abort and Drop fields. If the values increase continuously during the service process, locate the system fault.

soft and hard Parameters

**Table 1** Parameters
Parameter	Description
hard	If the NFS service of the server becomes invalid after the client is successfully mounted, the client attempts to resend the service request infinitely. Therefore, programs that access the file system, such as running the cd, ls, or df command, are suspended and services are not responded. After the NFS service is restarted on the server, the client waits the returned operation result for a period of time.
soft	When soft is used for mounting, an error is returned after the client attempts to resend the request in limited times.

Parent topic: NFS