SMCC NFS Server Performance and Tuning Guide
この本のみを検索
PDF 文書ファイルをダウンロードする
CHAPTER 5

Troubleshooting



Troubleshooting Tools

This chapter presents troubleshooting tips for the following types of problems:
  • General troubleshooting tuning
  • Client bottlenecks
  • Server bottlenecks
  • Network-related bottlenecks

General Troubleshooting Tuning Tips

This section (see TABLE 5-1) lists the actions to perform when you encounter a tuning problem.
TABLE 5-1
Command/ToolCommand Output/ResultAction
netstat -iCollis+Ierrs+Oerrs/Ipkts +
Opkts > 2%
Check the Ethernet hardware.
netstat -iCollis/Opkts > 10%Add an Ethernet interface and distribute the client load.
TABLE 5-1
Command/ToolCommand Output/ResultAction
netstat -iIerrs/Ipks > 25%The host may be dropping packets, causing high input error rate. To compensate for bandwidth-limited network hardware: reduce the packet size; set the read buffer size, rsize and/or the write buffer size wsize to 2048 when using mount or in the /etc/vfstab file. See "To Check the Network" in Chapter 3.
nfsstat -sreadlink > 10%Replace symbolic links with mount points.
nfsstat -swrites > 5%Install a Prestoserve NFS accelerator (SBus card or NVRAM-NVSIMM) for peak performance. See "Prestoserve NFS Accelerator" in Chapter 4.
nfsstat -sThere are any badcalls.The network may be overloaded. Identify an overloaded network using network interface statistics.
nfsstat -sgetattr > 40%Increase the client attribute cache using the actimeo option. Make sure the DNLC and inode caches are large. Use vmstat -s to determine the percent hit rate (cache hits) for the DNLC and, if needed, increase ncsize in the /etc/system file. See "Directory Name Lookup Cache (DNLC)" in Chapter 4.
vmstat -sHit rate (cache hits) < 90%Increase ncsize in the /etc/system file.
Ethernet monitor, for example: SunNet Manager(TM), SharpShooter, NetMetrixLoad > 35%Add an Ethernet interface and distribute client load.

Client Bottlenecks

This section (see TABLE 5-2) shows potential client bottlenecks and how to remedy them.
TABLE 5-2
Symptom(s)Command/ToolCauseSolution
NFS server hostname not responding or slow response to commands when using NFS-mounted directoriesnfsstatUser's path variableList directories on local file systems first, critical directories on remote file systems second, and then the rest of the remote file systems.
NFS server hostname not responding or slow response to commands when using NFS-mounted directoriesnfsstatRunning executable from an NFS-mounted file systemCopy the application locally (if used often).
NFS server hostname not responding; badxid >5% of total calls and badxid = timeoutnfsstat -rcClient times out before server respondsCheck for server bottleneck. If the server's response time isn't improved, increase the timeo parameter in the /etc/vfstab file of clients. Try increasing timeo to 25, 50, 100, 200 (tenths of seconds). Wait one day between modifications and check to see if the number of time-outs decreases.
badxid = 0nfsstat -rcSlow networkIncrease rsize and wsize in the /etc/vfstab file. Check interconnection devices (bridges, routers, gateways).

Server Bottlenecks

This section (see TABLE 5-3) shows server bottlenecks and how to remedy them.
TABLE 5-3
Symptom(s)Command/ToolCauseSolution
NFS server hostname not
responding
vmstat -s
or iostat
Cache hit rate is
< 90%
Adjust the suggested parameters
for DNLC, then run to see if the
symptom is gone. If not, reset
the parameters for DNLC.
Adjust the parameters for the
buffer cache, then the inode
cache, following the same
procedure as for the DNLC.
NFS server hostname not
responding
netstat -m
or nfsstat
Server not keeping
up with request
arrival rate
Check the network. If the
problem is not the network, add
appropriate Prestoserve NFS
accelerator, or upgrade the
server.
High I/O wait time or CPU idle time; slow disk access times or NFS server hostname not respondingiostat -xI/O load not balanced across disks; the svc_t value is greater than 40 msTake a large sample (~2 weeks). Balance the load across disks; add disks as necessary. Add a Prestoserve NFS accelerator for synchronous writes. To reduce disk and network traffic, use tmpfs for /tmp for both server and clients. Measure system cache efficiencies. Balance load across disks; add disks as necessary.
Slow response when accessing remote filesnetstat -s or snoopEthernet interface dropping packetsIf retransmissions are indicated, increase buffer size. For information on how to use snoop, see "snoop" in Appendix A.

Network Bottlenecks

This section (see TABLE 5-4) shows network-related bottlenecks and how to remedy them.
TABLE 5-4
SymptomsCommand/ToolCauseSolution
Poor response time when accessing directories mounted on different subnets or NFS server hostname not respondingnetstat -rsNFS requests being routedKeep clients on the subnet directly connected to server.
Poor response time when accessing directories mounted on different subnets or NFS server hostname not respondingnfsstatDropped packetsMake protocol queues deeper.
Poor response time when accessing directories mounted on different subnets or NFS server hostname not respondingnetstat -s shows incomplete or bad headers, bad data length fields, bad checksums.Network problemsCheck the network hardware.
Poor response time when accessing directories mounted on different subnets or NFS server hostname not responding; sum of input and output packets per second for an interface is over 600 per secondnetstat -iNetwork overloadedThe network segment is very busy. If this is a recurring problem, consider adding another (le) network interface.
Network interface collisions are over 120 per secondnetstat -iNetwork overloadedReduce the number of machines on the network or check the network hardware.
TABLE 5-4 (Continued)
SymptomsCommand/ToolCauseSolution
Poor response time when accessing directories mounted on different subnets or NFS server hostname not respondingnetstat -iHigh packet collision rate (Collis/Opkts>.10)- If packets are corrupted, it may be due to a corrupted MUX box; use the Network General Sniffer product or another protocol analyzer to find the cause.

- Check for overloaded network. If there are too many nodes, create another subnet. - Check network hardware; could be bad tap, transceiver, hub on 10base-T. Check cable length and termination.