SMCC NFS Server Performance and Tuning Guide
只搜寻这本书
以 PDF 格式下载本书
CHAPTER 3

Analyzing NFS Performance


This chapter explains how to analyze NFS performance and describes the general steps for tuning your system. This chapter also describes how to verify the performance of the network, server, and each client.

Tuning the NFS Server

When you first set up the NFS server, you need to tune it for optimal performance. Later, in response to a particular problem, you need to tune the server again to optimize performance.

Optimizing Performance

Follow these steps in sequence to improve the performance of your NFS server.
  1. Measure the current level of performance for the network, server, and each client. See "Checking Network, Server, and Client Performance" later in this chapter.

  2. Analyze the gathered data by graphing it. Look for exceptions, high disk and CPU utilization, and high disk service times. Apply thresholds or performance rules to the data.

  3. Tune the server. See Chapter 4, "Configuring the Server and the Client to Maximize NFS Performance."

  4. Repeat Steps 1 through 3 until you achieve the desired performance.

Resolving Performance Problems

Follow these steps in sequence to resolve performance problems with your NFS server.
  1. Use tools then observe the symptoms to pinpoint the source of the problem.

  2. Measure the current level of performance for the network, server, and each client. See "Checking Network, Server, and Client Performance.

  3. Analyze the data gathered by graphing the data. Look for exceptions, high disk and CPU utilization, and high disk service times. Apply thresholds or performance rules to the data.

  4. Tune the server. See Chapter 4, "Configuring the Server and the Client to Maximize NFS Performance."

  5. Repeat Steps 1 through 4 until you achieve the desired performance.


Checking Network, Server, and Client Performance

Before you can tune the NFS server, you must check the performance of the network, the NFS server, and each client. The first step is to check the performance of the network. If disks are operating normally, check network usage because a slow server and a slow network look the same to an NFS client.
FIGURE 3-1 illustrates the steps you must follow in sequence to check the network.
Find the number of packets and
collisions or errors on each network.
Determine how long a round trip echo packet takes on the network. Display packet losses.
End
FIGURE 3-1 Flow Diagram for Checking the Network Performance

· To Check the Network

  1. Find the number of packets, collisions, or errors on each network by typing

netstat -i 15.

To look at other interfaces use -I.

  server% netstat -i 15  
        input   le0       output    input       (Total)    output  
  packets errs  packets errs  colls  packets errs  packets errs  
  colls  
  10798731 533   4868520 0     1078   24818184 555   14049209 157  
  894937  
  51      0     43      0     0      238     0     139     0     0  
  85      0     69      0     0      218     0     131     0     2  
  44      0     29      0     0      168     0     94      0     0  

A description of the arguments to the netstat command follows:
-iShows the state of the interfaces that are used for TCP/IP traffic
15Collects information every 15 seconds
In the netstat -i 15 display, a machine with active network traffic should show both input packets and output packets continually increasing.
  1. Calculate the network collision rate by dividing the number of output collision counts (Output Colls - le) by the number of output packets (le).

    A network-wide collision rate greater than 10 percent can indicate an overloaded network, a poorly configured network, or hardware problems.

  2. Calculate the input packet error rate by dividing the number of input errors (le) by the total number of input packets (le).

    If the input error rate is high (over 25 percent), the host may be dropping packets.

    Transmission problems can be caused by other hardware on the network, as well as heavy traffic and low-level hardware problems. Bridges and routers can drop packets, forcing retransmissions and causing degraded performance.

    Bridges also cause delays when they examine packet headers for Ethernet addresses. During these examinations, bridge network interfaces may drop packet fragments.

    To compensate for bandwidth-limited network hardware:

  • Reduce the packet size specifications.
  • Set the read buffer size, rsize, and the write buffer size, wrsize, when using mount or in the /etc/vfstab file. Reduce the appropriate variable(s) (depending on the direction of data passing through the bridge) to 2048. If data passes in both directions through the bridge or other device, reduce both variables:

  server:/home  /home/server nfs rw,rsize=2048,wsize=2048 0 0  

If a lot of read and write requests are dropped and the client is communicating with the server using the User Datagram Protocol (UDP), then the entire packet will be retransmitted, instead of the dropped packets.
  1. Determine how long a round trip echo packet takes on the network by typing

    ping -sRv servername from the client to show the route taken by the packets.

    If the round trip takes more than a few milliseconds, there are slow routers on the network, or the network is very busy. Ignore the results from the first ping command. The ping -sRv command also displays packet losses.

The following screen shows the output of the ping -sRv command.

  client% ping -sRv servername  
  PING server: 56 data bytes  
  64 bytes from server (129.145.72.15): icmp_seq=0. time=5. ms  
    IP options:  <record route> router (129.145.72.1), server  
  (129.145.72.15), client (129.145.70.114),  (End of record)  
  64 bytes from server (129.145.72.15): icmp_seq=1. time=2. ms  
    IP options:  <record route> router (129.145.72.1), server  
  (129.145.72.15), client (129.145.70.114),  (End of record)  

A description of the arguments to the ping command follows:
sSend option. One datagram is sent per second and one line of output is printed for every echo response it receives. If there is no response, no output is produced.
RRecord route option. The Internet Protocol (IP) record option is set so that it stores the route of the packet inside the IP header.
vVerbose option. CMP packets other than echo response that are received are listed.
If you suspect a physical problem, use ping -sRv to find the response time of several hosts on the network. If the response time (ms) from one host is not what you expect, investigate that host.
The ping command uses the ICMP protocol's echo request datagram to elicit an ICMP echo response from the specified host or network gateway. It can take a long time on a time-shared NFS server to obtain the ICMP echo. The distance from the client to the NFS server is a factor for how long it takes to obtain the ICMP echo from the server.
FIGURE 3-2 shows the possible responses or the lack of response to the ping -sRv command.

图形

FIGURE 3-2 ping -sRv

Checking the NFS Server


Note - The server used in the following examples is a large SPARCserver 690 configuration.

· To Check the NFS Server

  1. Determine what is being exported by typing share.


  server% share  
  -               /export/home   rw=netgroup   ""  
  -               /var/mail   rw=netgroup   ""  
  -               /cdrom/solaris_2_3_ab   ro   ""  

  1. Display the file systems mounted and the disk drive on which the file system is mounted by typing df -k.

    If a file system is over 100 percent full, it may cause NFS write errors on the clients.


  server% df -k  
  Filesystem            kbytes    used   avail capacity  Mounted on  
  /dev/dsk/c1t0d0s0    73097      36739  29058     56%   /  
  /dev/dsk/c1t0d0s3    214638     159948 33230     83%   /usr  
  /proc                 0          0       0         0% /proc  
  fd                    0          0       0         0% /dev/fd  
  swap                 501684     32     501652    0%   /tmp  
  /dev/dsk/c1t0d0s4   582128     302556  267930   53%   /var/mail  
  /dev/md/dsk/d100    7299223    687386  279377   96%   /export/home  
  /vol/dev/dsk/c0t6/solaris_2_3_ab  
                       113512     113514  0        100%   /cdrom/solaris_2_3_ab  


Note - For this example, the /var/mail and /export/home file systems are used.

Determine on which disk the file systems returned by the df -k command are stored.
In the previous example, note that /var/mail is stored on /dev/dsk/c1t0d0s4 and /export/home is stored on /dev/md/dsk/d100, an Online: DiskSuite(TM) metadisk.
  1. Determine the disk number if an Online: DiskSuite metadisk is returned by the

    df -k command by typing/usr/opt/SUNWmd/sbin/metastat <disknumber>.

    In the previous example, /usr/opt/SUNWmd/sbin/metastat d100 determines what physical disk /dev/md/dsk/d100 uses.

    Note - The d100 disk is a mirrored disk. Each mirror is made up of three striped disks of one size concatenated with four striped disks of another size. There is also a hot spare disk. This system uses IPI disks (idX). SCSI disks (sdX) are treated identically.


  server% /usr/opt/SUNWmd/sbin/metastat d100  
  d100: metamirror  
      Submirror 0: d10  
        State: Okay  
      Submirror 1: d20  
        State: Okay  
      Regions which are dirty: 0%  
      Pass = 1  
      Read option = round-robin (default)  
      Write option =  parallel (default)  
      Size: 15536742 blocks  
  d10: Submirror of d100  
      State: Okay  
      Hot spare pool: hsp001  
      Size: 15536742 blocks  
      Stripe 0: (interlace : 96 blocks)  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c1t1d0s7          0     No    Okay  
       /dev/dsk/c2t2d0s7          0     No    Okay  
       /dev/dsk/c1t3d0s7          0     No    Okay  
      Stripe 1: (interlace : 64 blocks)  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c3t1d0s7          0     No    Okay  
       /dev/dsk/c4t2d0s7          0     No    Okay  
       /dev/dsk/c3t3d0s7          0     No    Okay  
       /dev/dsk/c4t4d0s7          0     No    Okay  
  d20: Submirror of d100  
      State: Okay  
      Hot spare pool: hsp001  
  Size: 15536742 blocks  
      Stripe 0: (interlace : 96 blocks)  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c2t1d0s7          0     No    Okay  
       /dev/dsk/c1t2d0s7          0     No    Okay  
       /dev/dsk/c2t3d0s7          0     No    Okay  
      Stripe 1: (interlace : 64 blocks)  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c4t1d0s7          0     No    Okay  
       /dev/dsk/c3t2d0s7          0     No    Okay  
       /dev/dsk/c4t3d0s7          0     No    Okay  
       /dev/dsk/c3t4d0s7          0     No    Okay      /dev/dsk/c2t4d0s7  

  1. Determine the /dev/dsk entries for each exported file system.

    Use the whatdev script to find the instance or nickname for the drive or type ls -lL /dev/dsk/c1t0d0s4 and more /etc/path_to_inst to find the /dev/dsk entries. Follow either procedure "To Determine the /dev/dsk Entries for Exported File Systems with the whatdev Script" or "To Identify the /dev/dsk Entries for Exported File Systems with ls -lL, " which follows.

    To Determine the /dev/dsk Entries for Exported File Systems with the whatdev Script

    a. Type the following whatdev script using a text editor.


  #!/bin/csh  
  # print out the drive name - st0 or sd0 - given the /dev entry  
  # first get something like "/iommu/.../.../sd@0,0"  
  set dev = `/bin/ls -l $1 | nawk '{ n = split($11, a, "/"); split(a[n],b,":");  
  for(i = 4; i < n; i++) printf("/%s",a[i]); printf("/%s\n", b[1]) }'`  
  if ( $dev == "" ) exit  
  # then get the instance number and concatenate with the "sd"  
  nawk -v dev=$dev '$1 ~ dev { n = split(dev, a, "/"); split(a[n], \  
  b, "@"); printf("%s%s\n", b[1], $2) }' /etc/path_to_inst  

b. Determine the /dev/dsk entry for the file system by typing df /<filesystemname>.
In this example you would type df /var/mail.

  furious% df /var/mail  
  Filesystem            kbytes    used   avail capacity  Mounted on  
  /dev/dsk/c1t0d0s4     582128  302556  267930    53%    /var/mail  

c. Determine the disk number by typing whatdev diskname (the disk name returned by the df /<filesystemname> command).
In this example you would type whatdev /dev/dsk/c1t0d0s4. Disk number id8 is returned, which is IPI disk 8.

  server% whatdev /dev/dsk/c1t0d0s4  
  id8  

d. Repeat steps b and c for each file system not stored on a metadisk
(dev/md/dsk).

e. If the file system is stored on a meta disk, (dev/md/dsk), look at the metastat output and run the whatdev script on all drives included in the metadisk.
In this example type whatdev /dev/dsk/c2t1d0s7.
There are 14 disks in the /export/home file system. Running the whatdev script on the /dev/dsk/c2t1d0s7 disk, one of the 14 disks comprising the /export/home file system, returns the following display.

  server% whatdev /dev/dsk/c2t1d0s7  
  id17  

Note that /dev/dsk/c2t1d0s7 is disk id17; this is IPI disk 17.
f. Go to Step 5.
To Identify the /dev/dsk Entries for Exported File Systems with ls -lL
If you did not follow the procedure outlined in "To Determine the /dev/dsk Entries for Exported File Systems with the whatdev Script, " follow these steps:
a. List the drive and its major and minor device numbers by typing
ls -lL <disknumber>.

For example, for the /var/mail file system, type: ls -lL /dev/dsk/c1t0d0s4.

  ls -lL /dev/dsk/c1t0d0s4  
  brw-r-----  1 root      66,  68 Dec 22 21:51 /dev/dsk/c1t0d0s4  

b. Locate the minor device number in the ls -lL output.
In the previous screen example, the first number following the file ownership (root), 66, is the major number. The second number, 68, is the minor device number.
c. Determine the disk number.
i. Divide the minor device number, 68 in the previous example, by 8 (68/8 = 8.5).
ii. Truncate the fraction.
The number 8 is the disk number.
d. Determine the slice (partition) number.
Look at the number following the s (for slice) in the disk number. For example, in /dev/dsk/c1t0d0s4, the 4 following the s refers to slice 4.
Now you know that the disk number is 8 and the slice number is 4. This disk is either sd8 (SCSI) or ip8 (IPI).
  1. View the disk statistics for each disk by typing iostat -x 15.

    The -x option supplies extended disk statistics. The 15 means disk statistics are gathered every 15 seconds.


  server% iostat -x 15  
  extended disk statistics  
  disk      r/s  w/s   Kr/s   Kw/s wait actv  svc_t  %w  %b  
  id10      0.1  0.2    0.4    1.0  0.0  0.0   24.1   0   1  
  id11      0.1  0.2    0.4    0.9  0.0  0.0   24.5   0   1  
  id17      0.1  0.2    0.4    1.0  0.0  0.0   31.1   0   1  
  id18      0.1  0.2    0.4    1.0  0.0  0.0   24.6   0   1  
  id19      0.1  0.2    0.4    0.9  0.0  0.0   24.8   0   1  
  id20      0.0  0.0    0.1    0.3  0.0  0.0   25.4   0   0  
  id25      0.0  0.0    0.1    0.2  0.0  0.0   31.0   0   0  
  id26      0.0  0.0    0.1    0.2  0.0  0.0   30.9   0   0  
  id27      0.0  0.0    0.1    0.3  0.0  0.0   31.6   0   0  
  id28      0.0  0.0    0.0    0.0  0.0  0.0    5.1   0   0  
  id33      0.0  0.0    0.1    0.2  0.0  0.0   36.1   0   0  
  id34      0.0  0.2    0.1    0.3  0.0  0.0   25.3   0   1  
  id35      0.0  0.2    0.1    0.4  0.0  0.0   26.5   0   1  
  id36      0.0  0.0    0.1    0.3  0.0  0.0   35.6   0   0  
  id8       0.0  0.1    0.2    0.7  0.0  0.0   47.8   0   0  
  id9       0.1  0.2    0.4    1.0  0.0  0.0   24.8   0   1  
  sd15      0.1  0.1    0.3    0.5  0.0  0.0   84.4   0   0  
  sd16      0.1  0.1    0.3    0.5  0.0  0.0   93.0   0   0  
  sd17      0.1  0.1    0.3    0.5  0.0  0.0   79.7   0   0  
  sd18      0.1  0.1    0.3    0.5  0.0  0.0   95.3   0   0  
  sd6       0.0  0.0    0.0    0.0  0.0  0.0  109.1   0   0  

Use the iostat -x 15 command to see the disk number for the extended disk statistics. In the next procedure you will use a sed script to translate the disk names into disk numbers.
The output for the extended disk statistics is:
r/sReads per second
w/sWrites per second
Kr/sKbytes read per second
Kw/sKbytes written per second
waitAverage number of transactions waiting for service (queue length)
actvAverage number of transactions actively being serviced
svc_tAverage service time, (milliseconds)
%wPercentage of time the queue is not empty
%bPercentage of time the disk is busy
  1. Translate disk names into disk numbers

    Use iostat and sar. One quick way to do this is to use a sed script:.

    a. Type a sed script using a text editor similar to the following d2fs.server sed script.

    Your sed script should substitute the file system name for the disk number.

    In this example, disk id8 is substituted for /var/mail and disks id9, id10,

  id11, id17, id18, id19, id25, id26, id27, id28, id33, id34,
  id35, and id36 are substituted for /export/home.


  sed 's/id8 /var/mail/  
       s/id9 /export/home/  
       s/id10 /export/home/  
       s/id11 /export/home/  
       s/id17 /export/home/  
       s/id18 /export/home/  
       s/id25 /export/home/  
       s/id26 /export/home/  
       s/id27 /export/home/  
       s/id28 /export/home/  
       s/id33 /export/home/  
       s/id34 /export/home/  
       s/id35 /export/home/  
       s/id36 /export/home/'  

b. Run the iostat -xc 15 command through the sed script by typing iostat -xc 15 | d2fs.server.
The options to the previous iostat -xc 15 | d2fs.server command are explained below.
-xSupplies extended disk statistics
-cReports the percentage of time the system was in user mode (us), system mode (sy), waiting for I/O (wt), and idling (id)
15Means disk statistics are gathered every 15 seconds
The following explains the output and headings of CODE EXAMPLE 3-1.

  % iostat -xc 15 | d2fs.server  
  extended disk statistics          cpu  
  disk              r/s  w/s   Kr/s   Kw/s wait actv  svc_t  %w  %b  us sy wt id  
  export/home       0.1  0.2    0.4    1.0  0.0  0.0   24.1   0   1   0 11  2 86  
  export/home      0.1  0.2    0.4    0.9  0.0  0.0   24.5   0   1  
  export/home      0.1  0.2    0.4    1.0  0.0  0.0   31.1   0   1  
  export/home      0.1  0.2    0.4    1.0  0.0  0.0   24.6   0   1  
  export/home      0.1  0.2    0.4    0.9  0.0  0.0   24.8   0   1  
  id20             0.0  0.0    0.1    0.3  0.0  0.0   25.4   0   0  
  export/home      0.0  0.0    0.1    0.2  0.0  0.0   31.0   0   0  
  export/home      0.0  0.0    0.1    0.2  0.0  0.0   30.9   0   0  
  export/home      0.0  0.0    0.1    0.3  0.0  0.0   31.6   0   0  
  export/home      0.0  0.0    0.0    0.0  0.0  0.0    5.1   0   0  
  export/home      0.0  0.0    0.1    0.2  0.0  0.0   36.1   0   0  
  export/home      0.0  0.2    0.1    0.3  0.0  0.0   25.3   0   1  
  export/home      0.0  0.2    0.1    0.4  0.0  0.0   26.5   0   1  
  export/home      0.0  0.0    0.1    0.3  0.0  0.0   35.6   0   0  
  var/mail         0.0  0.1    0.2    0.7  0.0  0.0   47.8   0   0  
  id9              0.1  0.2    0.4    1.0  0.0  0.0   24.8   0   1  
  sd15             0.1  0.1    0.3    0.5  0.0  0.0   84.4   0   0  
  sd16             0.1  0.1    0.3    0.5  0.0  0.0   93.0   0   0  
  sd17             0.1  0.1    0.3    0.5  0.0  0.0   79.7   0   0  
  sd18             0.1  0.1    0.3    0.5  0.0  0.0   95.3   0   0  
  sd6              0.0  0.0    0.0    0.0  0.0  0.0  109.1   0   0  

CODE EXAMPLE 3-1 Output for the iostat -xc 15 Command
The following is a description of the output for the iostat -xc 15 | d2fs.server command.
diskName of disk device
r/sAverage read operations per second
w/sAverage write operations per second
Kr/sAverage Kbytes read per second
Kw/sAverage Kbytes written per second
waitNumber of requests outstanding in the device driver queue
actvNumber of requests active in the disk hardware queue
%wOccupancy of the wait queue
%bOccupancy of the active queue--device busy
svc_tAverage service time in milliseconds for a complete disk request; includes wait time, active queue time, seek rotation, and transfer latency
usCPU time
sySystem time
wtWait for I/O time
idIdle time
c. Run the sar -d 15 1000 command through the sed script by typing sar -d 15 1000 | d2fs.server.

  server% sar -d 15 1000 | d2fs.server  
  12:44:17   device    %busy       avque   r+w/s  blks/s avwait  avserv  
  12:44:18  export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            id20          0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            var/mail      0         0.0       0       0     0.0   0.0  
            export/home   0         0.0       0       0     0.0   0.0  
            sd15          7         0.1       4     127     0.0  17.6  
            sd16          6         0.1       3     174     0.0  21.6  
            sd17          5         0.0       3     127     0.0  15.5  

CODE EXAMPLE 3-2 Output of the sar -d 15 1000 | d2fs.server Command
In CODE EXAMPLE 3-2, the sar -d option reports the activities of the disk devices. The 15 means that data is collected every 15 seconds. The 1000 means that data is collected 1000 times. The following terms and abbreviations explain the output.
deviceName of the disk device being monitored
%busyPercentage of time the device spent servicing a transfer request (same as iostat %b)
avqueAverage number of requests outstanding during the monitored period (measured only when the queue was occupied) (same as iostat actv)
r+w/s       Number of read and write transfers to the device, per second (same as 
            iostat r/s + w/s)

blks/s      Number of 512-byte blocks transferred to the device, per second (same 
            as iostat 2*(Kr/s + Kw/s))

avwaitAverage time, in milliseconds, that transfer requests wait in the queue (measured only when the queue is occupied) (iostat wait gives the length of this queue)
avservAverage time, in milliseconds, for a transfer request to be completed by the device (for disks, this includes seek, rotational latency, and data transfer times)
d. For file systems that are exported via NFS, check the %b/%busy value.
If it is more than 30 percent, check the svc_t value.
The %b value, the percentage of time the disk is busy, is returned by the iostat command. The %busy value, the percentage of time the device spent servicing a transfer request, is returned by the sar command. If the %b and the %busy values are greater than 30 percent, go to Step e. Otherwise, go to Step 7.
e. Calculate the svc_t/(avserv + avwait) value.
The svc_t value, the average service time in milliseconds, is returned by the iostat command. The avserv value, the average time (milliseconds) for a transfer request to be completed by the device, is returned by the sar command. Add the avwait to get the same measure as svc_t.
If the svc_t value, the average total service time in milliseconds, is more than 40 ms, the disk is taking a long time to respond. An NFS request with disk I/O will appear to be slow by the NFS clients. The NFS response time should be less than 50 ms on average, to allow for NFS protocol processing and network transmission time. The disk response should be less than 40 ms.
The average service time in milliseconds is a function of the disk. If you have fast disks, the average service time should be less if you have slow disks.
  1. Collect data on a regular basis by uncommenting the lines in the user's

    sys crontab file so that sar collects the data for one month.

    Performance data will be continuously collected to provide a history of sar results.


  root# crontab -l sys  
  #ident"@(#)sys1.592/07/14 SMI"/* SVr4.0 1.2*/  
  #  
  # The sys crontab should be used to do performance collection.  
  # See cron and performance manual pages for details on startup.  
  0 * * * 0-6 /usr/lib/sa/sa1  
  20,40 8-17 * * 1-5 /usr/lib/sa/sa1  
  5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A  

Performance data is continuously collected to provide you with a history of sar results.

Note - A few hundred Kbytes will be used at most in /var/adm/sa.

  1. Spread the load over the disks.

    Stripe the file system over multiple disks if the disks are overloaded using Solstice(TM) DiskSuite or Online: DiskSuite. Reduce the number of accesses and spread peak access loads out in time using a Prestoserve write cache (see "Using Solstice DiskSuite or Online: DiskSuite to Spread Disk Access Load,"in Chapter 4.)

  2. Adjust the buffer cache if you have read-only file systems (see "Adjusting the Buffer Cache (bufhwm)," in Chapter 4.

  1. Display server statistics to identify NFS problems by typing nfsstat -s. See CODE EXAMPLE 3-3</>.

    The -s option displays server statistics.


  server% nfsstat -s  
  Server rpc:  
  calls      badcalls   nullrecv   badlen     xdrcall  
  480421     0          0          0          0  
  Server nfs:  
  calls      badcalls  
  480421     2  
  null       getattr    setattr    root       lookup     readlink   read  
  95  0%     140354 29% 10782  2%  0  0%      110489 23% 286  0%    63095 13%  
  wrcache    write      create     remove     rename     link       symlink  
  0  0%      139865 29% 7188  1%   2140  0%   91  0%     19  0%     231  0%  
  mkdir      rmdir      readdir    statfs  
  435  0%    127  0%    2514  1%   2710  1%  

CODE EXAMPLE 3-3 Using the nfsstat -s Command to Display Server Statistics
The NFS server display shows the number of NFS calls received (calls) and rejected (badcalls), and the counts and percentages for the various calls that were made. The number and percentage of calls returned by the nfsstat -s command are shown in CODE EXAMPLE 3-3.
The following terms explain the output of CODE EXAMPLE 3-3.
callsTotal number of RPC calls received
badcallsTotal number of calls rejected by the RPC layer (the sum of badlen and xdrcall)
nullrecvNumber of times an RPC call was not available when it was thought to be received
badlenNumber of RPC calls with a length shorter than a minimum-sized RPC call
xdrcallNumber of RPC calls whose header could not be XDR decoded
TABLE 3-1 explains the nfsstat -s command output (CODE EXAMPLE 3-3) and what actions to take.
TABLE 3-1 nfsstat -s
IfThen
writes > 5%**Install a Prestoserve NFS accelerator (SBus card or NVRAM-NVSIMM) for peak performance. See "Prestoserve NFS Accelerator," in Chapter 4.
There are any badcallsBadcalls are calls rejected by the RPC layer and are the sum of badlen and xdrcall. The network may be overloaded. Identify an overloaded network using network interface statistics.
readlink > 10% of total lookup calls on NFS serversNFS clients are using excessive symbolic links that are on the file systems exported by the server. Replace the symbolic link with a directory. Mount both the underlying file system and the symbolic link's target on the NFS client. See Step 11.
getattr > 40%Increase the client attribute cache using the actimeo option. Make sure that the DNLC and inode caches are large. Use vmstat -s to determine the percent hit rate (cache hits) for the DNLC and, if needed, increase ncsize in the /etc/system file. See Step 12 later in this chapter and "Directory Name Lookup Cache (DNLC)"in Chapter 4.
** The number of writes, 29% in CODE EXAMPLE 3-3 is very high.
  1. Eliminate symbolic links.

If symlink is greater than ten percent in the output of the nfsstat -s command (see CODE EXAMPLE 3-3), eliminate symbolic links. In the following example, /usr/tools/dist/sun4 is a symbolic link for /usr/dist/bin.
a. Type rm /usr/dist/bin to eliminate the symbolic link for /usr/dist/bin.

  # rm /usr/dist/bin  

b. Make /usr/dist/bin a directory by typing mkdir /usr/dist/bin.

  # mkdir /usr/dist/bin  

c. Mount the directories and type the following:

  client# mount server: /usr/dist/bin  
  client# mount server: /usr/tools/dist/sun4  
  client# mount  

  1. View the Directory Name Lookup Cache (DNLC) hit rate by typing vmstat -s.

    This command returns the hit rate (cache hits).


  % vmstat -s  
  ... lines omitted  
  79062 total name lookups (cache hits 94%)  
  16 toolong  

a. If the hit rate is less than 90 percent and there is no problem with the number of longnames, increase the ncsize variable in the /etc/system file by typing:

  set ncsize=5000  

Directory names less than 30 characters long are cached and names that are too long to be cached are also reported.
The default value of ncsize is: ncsize (name cache) = 17 * maxusers + 90
  • For NFS server benchmarks ncsize has been set as high as 16000.
  • For maxusers = 2048 ncsize would be set at 34906.
For more information on the Directory Name Lookup Cache, see "Directory Name Lookup Cache (DNLC)," in Chapter 4.
b. Reboot the system.
  1. Check the system state if the system has a Prestoserve NFS accelerator by typing

    /usr/sbin/presto. Verify that it is in the UP state.


  server% /usr/sbin/presto  
  state = UP, size = 0xfff80 bytes  
  statistics interval: 1 day, 23:17:50  (170270 seconds)  
  write cache efficiency: 65%  
  All 2 batteries are ok  

  • If it is in the DOWN state, type presto -u.

  server% presto -u  

  • If it is in the error state, see the Prestoserve User's Guide.
TABLE 3-2 describes the NFS operations and their functions for versions 2 and 3.
TABLE 3-2
OperationFunction in Version 2Change in Version 3
createCreates a file system node; may be a file or a symbolic linkNo change
statfsGets dynamic file system informationReplaced by fsstat
getattrGets file or directory attributes such as file type, size, permissions, and access timesNo change
linkCreates a hard link in the remote file systemNo change
lookupSearches directory for file and return file handleNo change
mkdirCreates a directoryNo change
nullDoes nothing; used for testing and timing of server responseNo change
readReads an 8-Kbyte block of dataBlock of data up to 4 Gbytes
readdirReads a directory entryNo change
readlinkFollows a symbolic link on the serverNo change
renameChanges the directory name entryNo change
removeRemoves a file system nodeNo change
rmdirRemoves a directoryNo change
TABLE 3-2 (Continued)
OperationFunction in Version 2Change in Version 3
rootRetrieves the root of the remote file system (not presently used)Removed
setattrChanges file or directory attributesNo change
symlinkMakes a symbolic link in a remote file systemNo change
wrcacheWrites an 8 Kbyte block of data to the remote cache (not presently used)Removed
writeWrites an 8 Kbyte block of dataBlock of data up to 4 Gbytes
This completes the steps you use to check the server. Continue by checking each client.

Checking Each Client

The overall tuning process must include client tuning. Sometimes, tuning the client yields more improvement than fixing the server. For example, adding 4 Mbytes of memory to each of 100 clients dramatically decreases the load on an NFS server.

· To Check Each Client

  1. Check the client statistics for NFS problems by typing nfsstat -c at the % prompt (see CODE EXAMPLE 3-4).

Look for errors and retransmits.

  client % nfsstat -c  
  Client rpc:  
  calls      badcalls   retrans    badxids    timeouts   waits      newcreds  
  384687     1          52         7          52         0          0  
  badverfs   timers     toobig     nomem      cantsend   bufulocks  
  0          384        0          0          0          0  
  Client nfs:  
  calls      badcalls   clgets     cltoomany  
  379496     0          379558     0  
  Version 2: (379599 calls)  
  null       getattr    setattr    root       lookup     readlink   read  
  0 0%       178150 46% 614 0%     0 0%       39852 10%  28 0%      89617 23%  
  wrcache    write      create     remove     rename     link       symlink  
  0 0%       56078 14%  1183 0%    1175 0%    71 0%      51 0%      0 0%  
  mkdir      rmdir      readdir    statfs  
  49 0%      0 0%       987 0%     11744 3%  

CODE EXAMPLE 3-4 Output of the nfsstat -c Command
The output of CODE EXAMPLE 3-4 shows that there were only 52 retransmits (retrans ) and 52 time-outs (timeout) out of 384687 calls.
The nfsstat -c display in CODE EXAMPLE 3-4 contains the following fields:
callsTotal number of calls sent
badcallsTotal number of calls rejected by RPC
retransTotal number of retransmissions
badxidNumber of times that a duplicate acknowledgment was received for a single NFS request
timeoutNumber of calls that timed out
waitNumber of times a call had to wait because no client handle was available
newcredNumber of times the authentication information had to be refreshed
TABLE 3-2, shown earlier in this chapter, describes the NFS operations. TABLE 3-3 explains the output of the nfsstat -c command and what action to take.
TABLE 3-3 nfsstat -c
IfThen
retrans > 5% of the callsThe requests are not reaching the server.
badxid is approximately equal to
badcalls
The network is slow. Consider installing a faster
network or installing subnets.
badxid is approximately equal to timeoutsMost requests are reaching the server but the server is slower than expected. Watch expected times using nfsstat -m.
badxid is close to 0The network is dropping requests. Reduce rsize
and wsize in the mount options.
null > 0A large amount of null calls suggests that the automounter is retrying the mount frequently. The timeout values for the mount are too short. Increase the mount timeout parameter, timeo, on the automounter command line
The third party tools you can use for NFS and networks include:
  • NetMetrix (Hewlett-Packard)
  • SharpShooter (AIM Technology)
  1. Display statistics for each NFS mounted file system by typing nfsstat -m.

    The statistics include the server name and address, mount flags, current read and write sizes, transmission count, and the timers used for dynamic transmission.


  client % nfsstat -m  
  /export/home from server:/export/home  
   Flags:  
  vers=2,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5  
   Lookups: srtt=10 (25ms), dev=4 (20ms), cur=3 (60ms)  
   Reads:   srtt=9 (22ms), dev=7 (35ms), cur=4 (80ms)  
   Writes:  srtt=7 (17ms), dev=3 (15ms), cur=2 (40ms)  
   All:     srtt=11 (27ms), dev=4 (20ms), cur=3 (60ms)  

Descriptions of the following terms, used in the output of the nfsstat -m command, follow:
srttSmoothed round-trip time
devEstimated deviation
curCurrent backed-off timeout value
The numbers in parentheses in the previous code example are the actual times in milliseconds. The other values are unscaled values kept by the operating system kernel. You can ignore the unscaled values. Response times are shown for lookups, reads, writes, and a combination of all of these operations (all). TABLE 3-4 shows the appropriate action for the nfsstat -m command.

Note - These statistics are only generated for NFS over UDP (the default for version 2.) NFS over TCP does not need retransmit timers and is the default for version 3.

TABLE 3-4 nfsstat -m
IfThen
srtt > 50 msThat mount point is slow. Check the network and the server for the disk(s) that provide that mount point. See "To Check the Network" and "To Check the NFS Server" earlier in this chapter.
The message "NFS server not responding" is displayedTry increasing the timeo parameter in the /etc/vfstab file to eliminate the messages and improve performance. Doubling the initial timeo parameter value is a good baseline.

After changing the timeo value in the vfstab file, invoke the nfsstat -c command and observe the badxid value returned by the command. Follow the recommendations for the nfsstat -c command earlier in this section.

Lookups: cur > 80 msThe requests are taking too long to process. This indicates a slow network or a slow server.
Reads: cur > 150 msThe requests are taking too long to process. This indicates a slow network or a slow server.
Writes: cur > 250 msThe requests are taking too long to process. This indicates a slow network or a slow server.