|
| 以 PDF 格式下载本书
Analyzing NFS Performance
3
- This chapter covers how to analyze NFS performance and briefly describes the general steps needed to tune your system. This chapter also describes how to check the network, server, and each client.
3.1 Tuning Steps
- You will occasionally be required to tune your NFS server. You may want to tune a server for best performance when you set it up, and later to improve performance in response to a problem.
· General Performance Improvement Tuning Steps
- When you tune to improve performance either on an existing system or on a newly setup system:
-
-
Measure the current level of performance for the network, server, and each client.
See Section 3.2, "Checking the Network, Server, and Each Client."
-
Analyze the data gathered.
-
Tune the server.
See Chapter 4, "Configuration Recommendations for NFS Performance."
-
Repeat steps 1 through 3 until you achieve the desired performance level.
· Performance Problem Resolution Tuning Steps
- When you tune as the result of a performance problem:
-
-
Observe symptoms and use tools to pinpoint the source of problem.
-
Measure the current level of performance for the network, server, and each client.
See Section 3.2, "Checking the Network, Server, and Each Client."
-
Analyze the data gathered.
-
Tune the server.
See Chapter 4, "Configuration Recommendations for NFS Performance."
-
Repeat steps 1 through 4 until you achieve the desired performance level.
3.2 Checking the Network, Server, and Each Client
- Before you can tune the NFS server, you must check the performance of the network, the NFS server, and each client. The first step is to check the performance of the network.
3.2.1 Checking the Network
- If disks are operating normally, check network usage because a slow server and a slow network look the same to an NFS client.
-
Figure 3-1 illustrates the steps you must follow in sequence to check the network.

Figure 3-1
· To find the number of packets and collisions/errors on each network
-
-
Type netstat -i 15.
-
server% netstat -i 15
input le0 output input (Total) output
packets errs packets errs colls packets errs packets errs colls
10798731 533 4868520 0 1078 24818184 555 14049209 157 894937
51 0 43 0 0 238 0 139 0 0
85 0 69 0 0 218 0 131 0 2
44 0 29 0 0 168 0 94 0 0
|
- Use -I (Capital I) interface to look at other interfaces.
-
| -i | Shows the state of the interfaces that are used for TCP/IP traffic |
| 15 | Collects information every 15 seconds |
- In the netstat -i 15 display, a machine with active network traffic should show both input packets and output packets continually increasing.
-
a. Calculate the network collision rate by dividing the number of output collision counts (Output Colls - le) by the number of output packets (le).
- A network-wide collision rate greater than 10 percent can indicate problems such as an overloaded network, a poorly configured network, or hardware problems.
-
b. Calculate the input packet error rate by dividing the number of input errors (le) by the total number of input packets (le).
- If the input error rate is high (over 25 percent), the host may be dropping packets.
- Other hardware on the network, as well as heavy traffic and low-level hardware problems, can introduce transmission problems. Bridges and routers can drop packets, forcing retransmissions and causing degraded performance.
- Bridges also cause delays when they examine packet headers for Ethernet addresses. During these examinations, bridge network interfaces may drop packet fragments.
- To compensate for bandwidth-limited network hardware:
-
-
server:/home /home/server nfs rw,rsize=2048,wsize=2048 0 0
|
- If a lot of read and write requests are dropped and the client is communicating with the server using the User Datagram Protocol (UDP), then the entire packet will be retransmitted, instead of the dropped packets.
· To determine how long a round trip echo packet takes on the network and to display packet losses
-
* Type ping -sRv servername from the client to show the route taken by the packets.
- If the round trip takes more than a few milliseconds (ms), the network is slow, there are slow routers on the network, or the network is very busy. Ignore the results from the first ping command
-
client% ping -sRv servername
PING dreadnought: 56 data bytes
64 bytes from server (129.145.72.15): icmp_seq=0. time=5. ms
IP options: <record route> router (129.145.72.1), server
(129.145.72.15), client (129.145.70.114), (End of record)
64 bytes from server (129.145.72.15): icmp_seq=1. time=2. ms
IP options: <record route> router (129.145.72.1), server
(129.145.72.15), client (129.145.70.114), (End of record)
|
-
s..Sends one datagram per second and prints one line of output for every echo response it receives. If there is no response, no output is produced.
-
| R | Record route. Sets the Internet Protocol (IP) record option which stores the route of the packet inside the IP header. |
| v | Verbose option. Lists any ICMP packets other than echo response that are received. |
- If you suspect a physical problem, use ping -sRv to find the response time of several hosts on the network. If the response time (ms) from one host is not what you would expect, investigate that host.
- The ping command uses the ICMP protocol's echo request datagram to elicit an ICMP echo response from the specified host or network gateway. It can take a long time on a time-shared NFS server to obtain the ICMP echo. The distance from the client to the NFS server is a factor for how long it takes to obtain the ICMP echo from the server.
-
Figure 3-2 shows the possible responses or the lack of response to the ping -sRv command.

Figure 3-2 ping -sRv
3.2.2 Checking the NFS Server
-
Note - The server used in the following examples is a large SPARCserver 690 configuration.
-
Figure 3-3, which follows, illustrates the steps you must follow in sequence to check the NFS server.

Figure 3-3
· To see what is being exported
-
* Type share at the % prompt.
-
server% share
- /export/home rw=netgroup ""
- /var/mail rw=netgroup ""
- /cdrom/solaris_2_3_ab ro ""
|
· To display the file systems mounted and the actual disk drive on which the file system is mounted
-
* Type df -k at the % prompt. If an file system is over 100 percent full, it may cause NFS write errors on the clients.
-
server% df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c1t0d0s0 73097 36739 29058 56% /
/dev/dsk/c1t0d0s3 214638 159948 33230 83% /usr
/proc 0 0 0 0% /proc
fd 0 0 0 0% /dev/fd
swap 501684 32 501652 0% /tmp
/dev/dsk/c1t0d0s4 582128 302556 267930 53% /var/mail
/dev/md/dsk/d100 7299223 687386 279377 96% /export/home
/vol/dev/dsk/c0t6/solaris_2_3_ab
113512 113514 0 100% /cdrom/solaris_2_3_ab
|
-
Note - For this example, the /var/mail and /export/home file systems are used.
· Determine on which disk number the file systems returned by the df -k command are stored
- In the previous example, you'll note that /var/mail is stored on /dev/dsk/c1t0d0s4 and /export/home is stored on /dev/md/dsk/d100, an Online: DiskSuite(TM) meta disk.
· If an Online: DiskSuite metadisk is returned by the df -k command, determine the disk number
-
* Type /usr/opt/SUNWmd/sbin/metastat <disknumber>. In the previous example, /usr/opt/SUNWmd/sbin/metastat d100 determines what physical disk /dev/md/dsk/d100 uses.
-
Note - The d100 disk is a mirrored disk. Each mirror is made up of three striped disks of one size concatenated with four striped disks of another size. There is also a hot spare disk. This system uses IPI disks, idX. SCSI disks, sdX, are treated identically.
-
server% /usr/opt/SUNWmd/sbin/metastat d100
d100: metamirror
Submirror 0: d10
State: Okay
Submirror 1: d20
State: Okay
Regions which are dirty: 0%
Pass = 1
Read option = round-robin (default)
Write option = parallel (default)
Size: 15536742 blocks
d10: Submirror of d100
State: Okay
Hot spare pool: hsp001
Size: 15536742 blocks
Stripe 0: (interlace : 96 blocks)
Device Start Block Dbase State Hot Spare
/dev/dsk/c1t1d0s7 0 No Okay
/dev/dsk/c2t2d0s7 0 No Okay
/dev/dsk/c1t3d0s7 0 No Okay
Stripe 1: (interlace : 64 blocks)
Device Start Block Dbase State Hot Spare
/dev/dsk/c3t1d0s7 0 No Okay
/dev/dsk/c4t2d0s7 0 No Okay
/dev/dsk/c3t3d0s7 0 No Okay
/dev/dsk/c4t4d0s7 0 No Okay
d20: Submirror of d100
State: Okay
Hot spare pool: hsp001
Size: 15536742 blocks
Stripe 0: (interlace : 96 blocks)
Device Start Block Dbase State Hot Spare
/dev/dsk/c2t1d0s7 0 No Okay
/dev/dsk/c1t2d0s7 0 No Okay
/dev/dsk/c2t3d0s7 0 No Okay
Stripe 1: (interlace : 64 blocks)
Device Start Block Dbase State Hot Spare
/dev/dsk/c4t1d0s7 0 No Okay
/dev/dsk/c3t2d0s7 0 No Okay
/dev/dsk/c4t3d0s7 0 No Okay
/dev/dsk/c3t4d0s7 0 No Okay /dev/dsk/c2t4d0s7
|
· To determine the /dev/dsk entries for each exported file system
- Use the whatdev script to find the instance or nickname for the drive or type ls -lL /dev/dsk/c1t0d0s4 and more /etc/path_to_inst to find the /dev/dsk entries.
- Follow either the procedure in the section "Using the whatdev Script" or the section "Using ls -lL to Identify /dev/dsk Entries."
-
Using the whatdev Script
-
a. Type the whatdev script using a text editor.
-
#!/bin/csh
# print out the drive name - st0 or sd0 - given the /dev entry
# first get something like "/iommu/.../.../sd@0,0"
set dev = `/bin/ls -l $1 | nawk '{ n = split($11, a, "/");
split(a[n],b,":"); for(i = 4; i < n; i++) printf("/%s",a[i]);
printf("/%s\n", b[1]) }'`
if ( $dev == "" ) exit
# then get the instance number and concatenate with the "sd"
nawk -v dev=$dev '$1 ~ dev { n = split(dev, a, "/"); split(a[n], \
b, "@"); printf("%s%s\n", b[1], $2) }' /etc/path_to_inst
|
-
b. Type df /<filesystemname> to determine the /dev/dsk entry for the file system.
- In this example you would type df /var/mail.
-
furious% df /var/mail
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c1t0d0s4 582128 302556 267930 53% /var/mail
|
-
c. Type whatdev diskname (the disk name returned by the df /<filesystemname> command) to determine the disk number. In this example you would type whatdev /dev/dsk/c1t0d0s4.
- Disk number id8 is returned. This is IPI disk 8.
-
server% whatdev /dev/dsk/c1t0d0s4
id8
|
-
d. Repeat steps b and c for each file system not stored on a meta disk (dev/md/dsk).
-
e. If the file system is stored on a meta disk, (dev/md/dsk), look at the metastat output and run the whatdev script on all the drives making up the meta disk.
- In this example type whatdev /dev/dsk/c2t1d0s7.
- There are 14 disks in the /export/home file system. Running the whatdev script on the /dev/dsk/c2t1d0s7 disk, one of the 14 disks comprising the /export/home file system, returns the following display.
-
server% whatdev /dev/dsk/c2t1d0s7
id17
|
- Note that /dev/dsk/c2t1d0s7 is disk id17. This is IPI disk 17.
-
f. Go to the procedure "To see the disk statistics for each disk" on page 3-15."
-
Using ls -lL to Identify /dev/dsk Entries If you followed the procedure "Using the whatdev Script" skip this section. Go to the procedure "To see the disk statistics for each disk."
- If you did not follow the procedure outlined in"Using the whatdev Script:"
-
a. Type ls -lL <disknumber> to list the drive and its major and minor device numbers.
- For example, for the /var/mail file system, type: ls -lL /dev/dsk/c1t0d0s4.
-
ls -lL /dev/dsk/c1t0d0s4
brw-r----- 1 root 66, 68 Dec 22 21:51 /dev/dsk/c1t0d0s4
|
-
b. In the ls -lL output, locate the minor device number. In the previous screen example, the first number following the file ownership (root), 66, is the major number. The second number, 68, is the minor device number.
-
c. Determine the disk number.
- Divide the minor device number, 68 in the previous example, by 8.
- Disk number = 68/8
- The answer is 8.5. Truncate the fraction. The number, 8, is the disk number.
-
d. Determine the slice (partition) number.
- Look at the number following the s (for slice) in the disk number. For example, in /dev/dsk/c1t0d0s4, the 4 following the s refers to slice 4.
- Now you know that the disk number is 8 and the slice number is 4. This disk is either sd8 (SCSI) or ip8 (IPI).
-
e. Go to the procedure "To see the disk statistics for each disk" which follows.
· To see the disk statistics for each disk
-
* Type iostat -x 15. The -x option supplies extended disk statistics. The 15 means disk statistics are gathered every 15 seconds.
-
server% iostat -x 15
extended disk statistics
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
id10 0.1 0.2 0.4 1.0 0.0 0.0 24.1 0 1
id11 0.1 0.2 0.4 0.9 0.0 0.0 24.5 0 1
id17 0.1 0.2 0.4 1.0 0.0 0.0 31.1 0 1
id18 0.1 0.2 0.4 1.0 0.0 0.0 24.6 0 1
id19 0.1 0.2 0.4 0.9 0.0 0.0 24.8 0 1
id20 0.0 0.0 0.1 0.3 0.0 0.0 25.4 0 0
id25 0.0 0.0 0.1 0.2 0.0 0.0 31.0 0 0
id26 0.0 0.0 0.1 0.2 0.0 0.0 30.9 0 0
id27 0.0 0.0 0.1 0.3 0.0 0.0 31.6 0 0
id28 0.0 0.0 0.0 0.0 0.0 0.0 5.1 0 0
id33 0.0 0.0 0.1 0.2 0.0 0.0 36.1 0 0
id34 0.0 0.2 0.1 0.3 0.0 0.0 25.3 0 1
id35 0.0 0.2 0.1 0.4 0.0 0.0 26.5 0 1
id36 0.0 0.0 0.1 0.3 0.0 0.0 35.6 0 0
id8 0.0 0.1 0.2 0.7 0.0 0.0 47.8 0 0
id9 0.1 0.2 0.4 1.0 0.0 0.0 24.8 0 1
sd15 0.1 0.1 0.3 0.5 0.0 0.0 84.4 0 0
sd16 0.1 0.1 0.3 0.5 0.0 0.0 93.0 0 0
sd17 0.1 0.1 0.3 0.5 0.0 0.0 79.7 0 0
sd18 0.1 0.1 0.3 0.5 0.0 0.0 95.3 0 0
sd6 0.0 0.0 0.0 0.0 0.0 0.0 109.1 0 0
|
- The iostat -x 15 command lets you see the disk number for the extended disk statistics. In the next procedure you will use a sed script to translate the disk names into disk numbers.
- The output for the extended disk statistics is:
-
| r/s | Reads per second |
| w/s | Writes per second |
| Kr/s | Kbytes read per second |
| Kw/s | Kbytes written per second |
| wait | Average number of transactions waiting for service (queue length) |
| actv | Average number of transactions actively being serviced |
| svc_t | Average service time, (milliseconds) |
| %w | Percentage of time the queue is not empty |
| %b | Percentage of time the disk is busy |
· To translate the disk names into disk numbers
- Use iostat and sar. One quick way to do this is to use a sed script.
-
-
Type in a sed script using a text editor similar to the following d2fs.server sed script.
Your sed script should substitute the file system name for the disk number. In this example, disk id8 is substituted for /var/mail and disks id9,
-
-
id10, id11, id17, id18, id19, id25, id26, id27, id28,
id33, id34, id35, and id36 are substituted for /export/home.
-
sed 's/id8 /var/mail/
s/id9 /export/home/
s/id10 /export/home/
s/id11 /export/home/
s/id17 /export/home/
s/id18 /export/home/
s/id25 /export/home/
s/id26 /export/home/
s/id27 /export/home/
s/id28 /export/home/
s/id33 /export/home/
s/id34 /export/home/
s/id35 /export/home/
s/id36 /export/home/'
|
-
-
Type iostat -xc 15 | d2fs.server to run the
Supplies extended disk statistics.
-
-c......Reports the percentage of time the system was in user mode (us), system mode (sy), waiting for I/O (wt), and idling (id).
-
15.....Means disk statistics are gathered every 15 seconds.
-
% iostat -xc 15 | d2fs.server
extended disk statistics cpu
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b us sy wt id
export/home 0.1 0.2 0.4 1.0 0.0 0.0 24.1 0 1 0 11 2 86
export/home 0.1 0.2 0.4 0.9 0.0 0.0 24.5 0 1
export/home 0.1 0.2 0.4 1.0 0.0 0.0 31.1 0 1
export/home 0.1 0.2 0.4 1.0 0.0 0.0 24.6 0 1
export/home 0.1 0.2 0.4 0.9 0.0 0.0 24.8 0 1
id20 0.0 0.0 0.1 0.3 0.0 0.0 25.4 0 0
export/home 0.0 0.0 0.1 0.2 0.0 0.0 31.0 0 0
export/home 0.0 0.0 0.1 0.2 0.0 0.0 30.9 0 0
export/home 0.0 0.0 0.1 0.3 0.0 0.0 31.6 0 0
export/home 0.0 0.0 0.0 0.0 0.0 0.0 5.1 0 0
export/home 0.0 0.0 0.1 0.2 0.0 0.0 36.1 0 0
export/home 0.0 0.2 0.1 0.3 0.0 0.0 25.3 0 1
export/home 0.0 0.2 0.1 0.4 0.0 0.0 26.5 0 1
export/home 0.0 0.0 0.1 0.3 0.0 0.0 35.6 0 0
var/mail 0.0 0.1 0.2 0.7 0.0 0.0 47.8 0 0
id9 0.1 0.2 0.4 1.0 0.0 0.0 24.8 0 1
sd15 0.1 0.1 0.3 0.5 0.0 0.0 84.4 0 0
sd16 0.1 0.1 0.3 0.5 0.0 0.0 93.0 0 0
sd17 0.1 0.1 0.3 0.5 0.0 0.0 79.7 0 0
sd18 0.1 0.1 0.3 0.5 0.0 0.0 95.3 0 0
sd6 0.0 0.0 0.0 0.0 0.0 0.0 109.1 0 0
|
-
| disk | Name of disk device. |
| r/s | Average read operations per second. |
| w/s | Average write operations per second. |
| Kr/s | Average Kbytes read per second |
| Kw/s | Average Kbytes written per second. |
| wait | Number of requests outstanding in the device driver queue. |
| actv | Number of requests active in the disk hardware queue. |
| %w | Occupancy of the wait queue. |
-
| %b | Occupancy of the active queue--device busy. |
| svc_t | Average service time in milliseconds for a complete disk request. This includes wait time, active queue time, seek rotation, and transfer latency. |
| us | CPU time. |
| sy | System time. |
| wt | Wait for I/O time. |
| id | Idle time. |
-
-
Type sar -d 15 1000 | d2fs.server to run the sar -d 15 1000 command through the sed script.
-
server% sar -d 15 1000 | d2fs.server
12:44:17 device %busy avque r+w/s blks/s avwait avserv
12:44:18 export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
id20 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
var/mail 0 0.0 0 0 0.0 0.0
export/home 0 0.0 0 0 0.0 0.0
sd15 7 0.1 4 127 0.0 17.6
sd16 6 0.1 3 174 0.0 21.6
sd17 5 0.0 3 127 0.0 15.5
|
- The sar -d option reports the activities of disk devices. The 15 means that data is collected every 15 seconds. The 1000 means that data is collected 1000 times.
-
| device | Name of the disk device being monitored. |
| %busy | Percentage of time the device spent servicing a transfer request (same as iostat %b). |
-
| avque | Average number of requests outstanding during the monitored period (measured only when the queue was occupied) (same as iostat actv). |
| r+w/s | Number of read and write transfers to the device, |
| blks/s | Number of 512-byte blocks transferred to the device, |
| avwait | Average time, in milliseconds, that transfer requests wait idly in the queue (measured only when the queue is occupied) (same as iostat wait). |
| avserv | Average time, in milliseconds, for a transfer request to be completed by the device (for disks, this includes seek, rotational latency, and data transfer times). |
-
-
For the file systems that are exported by way of NFS, check the %b/%busy value. If it is more than 30%, check the svc_t value.
The %b value, the percentage of time the disk is busy, is returned by the iostat command. The %busy value, the percentage of time the device spent servicing a transfer request, is returned by the sar command. If the %b and the %busy values are greater than 30 percent, go to step 5. Otherwise, go to the procedure "To collect data on a long-term basis," which follows.
-
Calculate the svc_t/avserv value.
The svc_t value, the average service time in milliseconds, is returned by the iostat command. The avserv value, the average time (milliseconds) for a transfer request to be completed by the device, is returned by the sar command. If the svc_t value, the average total service time in milliseconds, is more than 40 ms, this disk is taking a long time to respond. An NFS request that involves a disk I/O will be seen as slow by the NFS clients. The NFS response time should be less than 50 ms on average to allow for NFS protocol processing and network transmission time. The disk response should be less than 40 ms. The average service time in milliseconds is a function of the disk. If you have fast disks, the average service time should be less than if you have slow disks.
· To collect data on a long-term basis
-
* Uncomment the lines in the user's sys crontab file so that sar collects the data for one month.
- This continuously collects performance data and provides you with a history of sar results.
-
Note - A few hundred Kbytes will be used at most in /var/adm/sa.
-
root# crontab -l sys
#ident"@(#)sys1.592/07/14 SMI"/* SVr4.0 1.2*/
#
# The sys crontab should be used to do performance collection.
# See cron and performance manual pages for details on startup.
0 * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
|
· If disks are overloaded, spread the load out
-
* Use Online: DiskSuite to stripe the file system over multiple disks. Use a Prestoserve write cache to reduce the number of accesses and spread peak access loads out in time.
- See Section 4.3.4, "Using Online: Disk Suite to Spread Disk Access Load."
· If you have read-only file systems
-
* Increase the buffer cache.
- See Section 4.7.5, "Adjusting the Buffer Cache: bufhwm."
· To identify NFS problems, display server statistics
-
* Type nfsstat -s at the % prompt. The -s option displays server statistics.
-
server% nfsstat -s
Server rpc:
calls badcalls nullrecv badlen xdrcall
480421 0 0 0 0
Server nfs:
calls badcalls
480421 2
null getattr setattr root lookup readlink read
95 0% 140354 29% 10782 2% 0 0% 110489 23% 286 0% 63095 13%
wrcache write create remove rename link symlink
0 0% 139865 29% 7188 1% 2140 0% 91 0% 19 0% 231 0%
mkdir rmdir readdir statfs
435 0% 127 0% 2514 1% 2710 1%
|
- The server NFS display shows the number of NFS calls received (calls) and rejected (badcalls), and the counts and percentages for the various calls that were made. The number and percentage of calls returned by the nfsstat -s command are described in Table 3-2, later in this chapter.
-
| calls | Total number of RPC calls received |
| badcalls | Total number of calls rejected by the RPC layer (the |
| nullrecv | Number of times an RPC call was not available when it was thought to be received |
| badlen | Number of RPC calls with a length shorter than a minimum-sized RPC call |
| xdrcall | Number of RPC calls whose header could not be XDR decoded |
-
Table 3-1 explains the nfsstat -s command output and what actions to take.
-
Table 3-1 nfsstat -s
| If | Then |
| writes > 5% | Install a Prestoserve NFS accelerator (SBus card or NVRAM-NVSIMM) for peak performance. See Section 4.6, "Prestoserve NFS Accelerator." The number of writes, 29% in the previous example, is very high. |
There are any
badcalls | Badcalls are calls rejected by the RPC layer and are the
sum of badlen and xdrcall. The network may be
overloaded. Identify an overloaded network using
network interface statistics. |
| readlink > 10% of total lookup calls on NFS servers | NFS clients are using excessively symbolic links that are on the file systems exported by the server. Replace the symbolic link with a directory. Mount both the underlying file system and the symbolic link's target on the NFS client. See the procedure that follows "If symlink is greater than 10 percent in the output of the nfsstat -s command, eliminate symbolic links." |
| getattr > 40% | Increase the client attribute cache using the actimeo option. Make sure that the DNLC and inode caches are large. Use vmstat -s to determine the percent hit rate (cache hits) for the DNLC and if needed increase ncsize in the /etc/system file. See the procedure "To show the Directory Name Lookup Cache (DNLC) hit rate," in this chapter and Section 4.7.6, "Directory Name Lookup Cache (DNLC)" in Chapter 4. |
· If symlink is greater than 10 percent in the output of the nfsstat -s command, eliminate symbolic links
- In this example, /usr/tools/dist/sun4 is a symbolic link for
-
-
/usr/dist/bin.
-
-
Type rm /usr/dist/bin to eliminate the symbolic link for /usr/dist/bin.
# rm /usr/dist/bin
-
-
Type mkdir /usr/dist/bin to make /usr/dist/bin a directory.
-
| # mkdir /usr/dist/bin |
| Mount the directories. Type: |
-
client# mount server: /usr/dist/bin
client# mount server: /usr/tools/dist/sun4
client# mount
|
· To show the Directory Name Lookup Cache (DNLC) hit rate
-
-
Type vmstat -s.
This command returns the hit rate (cache hits).
-
% vmstat -s
... lines omitted
79062 total name lookups (cache hits 94%)
16 toolong
|
-
-
If the hit rate is less than 90 percent and there is not a problem with too many longnames, increase the variable, ncsize, in the /etc/system file
-
| set ncsize=5000 |
| Directory names less than 30 characters long are cached and names that are too long to be cached are also reported. The default value of ncsize is:
ncsize (name cache) = 17 * maxusers + 90
· For NFS server benchmarks it has been set as high as 16000. · For maxusers = 2048 it would be set at 34906. For more information on the Directory Name Lookup Cache, see Section 4.7.6, "Directory Name Lookup Cache (DNLC)."
3. Reboot the system.
|
· If the system has a Prestoserve NFS accelerator, check its state
-
-
Type /usr/sbin/presto at the % prompt.
Verify that it is in the UP state.
-
server% /usr/sbin/presto
state = UP, size = 0xfff80 bytes
statistics interval: 1 day, 23:17:50 (170270 seconds)
write cache efficiency: 65%
All 2 batteries are ok
|
-
-
If it is in the DOWN state, type presto -u at the % prompt.
-
| server% presto -u |
| 3. If it is in the error state, see the Prestoserve User's Guide.
Table 3-2 NFS Operations
|
-
| Operation | Function |
| create | Create a file system node; may be a file or symbolic link. |
| statfs | Get dynamic file system information. |
| getattr | Get file/directory attributes such as file type, size, permissions, and access times. |
| link | Create a hard link in the remote file system. |
| lookup | Search directory for file and return file handle. |
| mkdir | Create a directory. |
| null | Do nothing. Used for testing and timing of server response. |
| read | Read an 8 KByte block of data. |
| readdir | Read a directory entry. |
| readlink | Follow a symbolic link on the server. |
| rename | Change the file's directory name entry. |
| remove | Remove a file system node. |
| rmdir | Remove a directory. |
| root | Retrieve the root of the remote file system (not presently used). |
-
Table 3-2
| Operation | Function |
| setattr | Change file/directory attributes. |
| symlink | Make a symbolic link in a remote file system. |
| wrcache | Write an 8 KByte block of data to the remote cache (not presently used). |
| write | Write an 8 KByte block of data. |
3.2.3 Checking Each Client
- The overall tuning process must include client tuning. Sometimes, tuning the client yields more improvement than fixing the server. For example, adding 4 Mbytes of memory to each of 100 clients dramatically decreases the load on an NFS server.
-
Figure 3-4 illustrates the steps you must follow in sequence to check each client.

Figure 3-4
· To check the client statistics to see if the client is having NFS problems
-
* Type nfsstat -c at the % prompt. Look for errors and retransmits.
-
client % nfsstat -c
Client rpc:
calls badcalls retrans badxids timeouts waits newcreds
384687 1 52 7 52 0 0
badverfs timers toobig nomem cantsend bufulocks
0 384 0 0 0 0
Client nfs:
calls badcalls clgets cltoomany
379496 0 379558 0
Version 2: (379599 calls)
null getattr setattr root lookup readlink read
0 0% 178150 46% 614 0% 0 0% 39852 10% 28 0% 89617 23%
wrcache write create remove rename link symlink
0 0% 56078 14% 1183 0% 1175 0% 71 0% 51 0% 0 0%
mkdir rmdir readdir statfs
49 0% 0 0% 987 0% 11744 3%
|
- The output shows that there were only 52 retransmits (retrans ) and 52 time-outs (timeout) out of 384687 calls.
- The nfsstat -c display shows the following fields:
-
| calls | Total number of calls sent |
| badcalls | Total number of calls rejected by RPC |
| retrans | Total number of retransmissions |
| badxid | Number of times that a duplicate acknowledgment was received for a single NFS request |
| timeout | Number of calls that timed out |
| wait | Number of times a call had to wait because no client handle was available |
| newcred | Number of times the authentication information had to be refreshed |
-
Table 3-2, earlier in this chapter, describes the NFS operations. Table 3-3 explains the output of the nfsstat -c command and what action to take.
-
Table 3-3 nfsstat -c
| If | Then |
| retrans > 5% of the calls | The requests are not reaching the server. |
badxid is approximately equal
to badcalls | The network is slow. Determine the cause of the
slow network. To remedy the problem consider
installing a faster network or installing subnets. |
| badxid is approximately equal to timeouts | Most requests are reaching the server but the server is slower than you expect. Watch expected times using nfsstat -m. |
| badxid is close to 0 | The network is dropping requests. Reduce rsize
and wsize in the mount options. |
| null > 0 | A large amount of null calls suggests that the automounter is retrying the mount frequently. The timeout values for the mount are too short. Increase the mount timeout parameter, timeo, on the automounter command line |
- Some of the third party tools you can use for NFS/networks include:
-
- NetMetrix (Hewlett-Packard)
- SharpShooter (AIM Technology)
· To display statistics for each NFS mounted file system
-
* Type nfsstat -m at the % prompt. The statistics include the server name and address, mount flags, current read and write sizes, transmission count, and the timers used for dynamic transmission.
-
client % nfsstat -m
/export/home from server:/export/home
Flags:
vers=2,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5
Lookups: srtt=10 (25ms), dev=4 (20ms), cur=3 (60ms)
Reads: srtt=9 (22ms), dev=7 (35ms), cur=4 (80ms)
Writes: srtt=7 (17ms), dev=3 (15ms), cur=2 (40ms)
All: srtt=11 (27ms), dev=4 (20ms), cur=3 (60ms)
|
-
| srtt | Smoothed round-trip time |
| dev | The estimated deviation |
| cur | Current backed-off timeout value |
- The numbers in parentheses are the actual times in milliseconds. The other values are unscaled values kept by the operating system kernel. You can ignore the unscaled values. Response times are shown for lookups, reads, writes and a combination of all of these operations (all). Table 3-4 shows the appropriate action for the nfsstat -m command.
-
Table 3-4 nfsstat -m
| If | Then |
| srtt > 50 ms | That mount point is slow. Check the network and the server for the disk(s) that provide that mount point. See the steps earlier in this chapter. |
| You have been noticing the message "NFS server not responding" | Try increasing the timeo parameter, in the /etc/vfstab file, to eliminate the messages and to improve performance. Doubling the initial timeo parameter value is a good baseline. After changing the timeo value in the vfstab file, invoke the nfsstat -c command and observe the badxid value returned by the command. Follow the recommendations for the nfsstat -c command earlier in this section.
|
| Lookups: cur > 80 ms | The requests are taking too long to get processed. This indicates a slow network or a slow server. |
| Reads: cur > 150 ms | The requests are taking too long to get processed. This indicates a slow network or a slow server. |
| Writes: cur > 250 ms | The requests are taking too long to get processed. This indicates a slow network or a slow server. |
|
|