NFS Administration Guide
  Buscar sólo este libro
Descargar este libro en PDF

NFS Troubleshooting

4

This chapter describes problems that may occur on computers using NFS services. It contains procedures for fixing and tracking NFS problems. A reference section is also included. If you want to skip the background information that explains NFS internals and proceed directly to step-by-step instructions, use the following table to find the page where instructions for specific tasks begin.
Strategies for NFS Troubleshootingpage 51
NFS Troubleshooting Procedurespage 53
How to Check Connectivity on a NFS Clientpage 53
How to Remotely Check the NFS Serverpage 54
How to Verify the NFS Service on the Serverpage 55
How to Restart NFS Servicespage 57
How to Warm Start rpcbindpage 58
Common NFS Error Messagespage 58

Strategies for NFS Troubleshooting

When tracking down an NFS problem, keep in mind that there are three main points of possible failure: the server, the client, and the network. The strategy outlined in this section tries to isolate each individual component to find the one that is not working. In all cases, the mountd and nfsd daemons must be running on the server for remote mounts to succeed.

Note - The mountd and nfsd daemons start automatically at boot time only if there are NFS share entries in the /etc/dfs/dfstab file. Therefore, mountd and nfsd must be started manually when setting up sharing for the first time.

The intr option is set by default for all mounts. If a program hangs with a "server not responding" message, it can be killed with the keyboard interrupt Control-c.
When the network or server has problems, programs that access hard-mounted remote files will fail differently than those that access soft-mounted remote files. Hard-mounted remote file systems cause the client's kernel to retry the requests until the server responds again. Soft-mounted remote file systems cause the client's system calls to return an error after trying for a while. Because these errors may result in unexpected application errors, soft mounting is not recommended.
When a file system is hard mounted, a program that tries to access it hangs if the server fails to respond. In this case, the NFS system displays the following message on the console.

  NFS server hostname not responding still trying  

When the server finally responds, the following message appears on the console.

  NFS server hostname ok  

A program accessing a soft-mounted file system whose server is not responding will generate the following message:

  NFS operation failed for server hostname: error # (error_message)  


Note - Because of possible errors, do not soft-mount file systems with read-write data or file systems from which executables will be run. Writable data could be corrupted if the application ignores the errors. Mounted executables may not load properly and can fail.

NFS Troubleshooting Procedures

To determine where the NFS service has failed, it is necessary to follow several procedures to isolate the failure. The following items need to be checked:
  • Can the client reach the server?
  • Can the client contact the NFS services on the server?
  • Are the NFS services running on the server?
In the process of checking these items it may become apparent that other portions of the network are not functioning, such as the name service or the physical network hardware. Debugging procedures for the NIS+ name service are found in NIS+ and FNS Administration Guide. Also, during the process it may become obvious that the problem isn't at the client end (for instance, if you get at least one trouble call from every subnet in your work area). In this case, it is much more timely to assume that the problem is the server or the network hardware near the server, and start the debugging process at the server not at the client.

· How to Check Connectivity on a NFS Client

  1. Make sure that the NFS server is reachable from the client. On the client, type the following command.


  % /usr/sbin/ping bee  
  bee is alive  

If the command reports that the server is alive, remotely check the NFS server (see "How to Remotely Check the NFS Server" on page 54).
  1. If the server is not reachable from the client, make sure that the local name service is running. For NIS+ clients type the following:


  % /usr/lib/nis/nisping -u  
  Last updates for directory eng.acme.com. :  
  Master server is eng-master.acme.com.  
          Last update occurred at Mon Jun  5 11:16:10 1995  
  
  Replica server is eng1-replica-58.acme.com.  
          Last Update seen was Mon Jun  5 11:16:10 1995  

  1. If the name service is running, but the server is not reachable from the client, run the ping command from another client.

    If the command run from a second client fails, see "How to Verify the NFS Service on the Server" on page 55.

  2. If the server is reachable from the second client, use ping to check connectivity of the first client to other systems on the local net. If this fails, check the networking software configuration on the client (/etc/netmasks, /etc/nsswitch.conf, and so forth).

  3. If the software is correct, check the networking hardware.

    Try moving the client onto a second net drop.

· How to Remotely Check the NFS Server

  1. Check that the server's nfsd processes are responding. On the client, type the following command.


  % /usr/bin/rpcinfo -u bee nfs  
  program 100003 version 2 ready and waiting  
  program 100003 version 3 ready and waiting  

If the server is running, it prints a list of program and version numbers. Using the -t option will test the TCP connection. If this fails, skip to "How to Verify the NFS Service on the Server" on page 55.
  1. Check that the server's mountd is responding, by typing the following command.


  % /usr/bin/rpcinfo -u bee mountd  
  program 100005 version 1 ready and waiting  
  program 100005 version 2 ready and waiting  
  program 100005 version 3 ready and waiting  

Using the -t option will test the TCP connection. If either fails, skip to "How to Verify the NFS Service on the Server" on page 55.
  1. Check the local autofs service, if it is being used:


  % cd /net/wasp  

Choose a /net or /home mount point that you know should work properly. If this doesn't work, then as root on the client, type the following to restart the autofs service.

  # /etc/init.d/autofs stop  
  # /etc/init.d/autofs start  

  1. Verify that file system is shared as expected on the server.


  % /usr/sbin/showmount -e bee  
  /usr/src              eng  
  /export/share/man (everyone)  

Check the entry on the server and the local mount entry for errors. Also check the name space. In this instance, if the first client is not in the eng netgroup, then they would not be able to mount the /usr/src file system.
Make sure to check the entries in all of the local files that include mounting information. The list includes /etc/vfstab and all of the /etc/auto_* files.

· How to Verify the NFS Service on the Server

  1. Log onto the server as root.

  2. Make sure that the server can reach the clients.


  # ping lilac  
  lilac is alive  

  1. If the client is not reachable from the server, make sure that the local name service is running. For NIS+ clients type the following:


  % /usr/lib/nis/nisping -u  
  Last updates for directory eng.acme.com. :  
  Master server is eng-master.acme.com.  
          Last update occurred at Mon Jun  5 11:16:10 1995  
  
  Replica server is eng1-replica-58.acme.com.  
          Last Update seen was Mon Jun  5 11:16:10 1995  

  1. If the the name service is running, check the networking software configuration on the server (/etc/netmasks, /etc/nsswitch.conf, and so forth).

  2. Type the following command to check whether the nfsd daemon is running.


  # rpcinfo -u localhost nfs  
  program 100003 version 2 ready and waiting  
  program 100003 version 3 ready and waiting  
  # ps -ef | grep nfsd  
  root   232   1 0      Apr 07 ?      0:01 /usr/lib/nfs/nfsd -a 16  
  root  3127 24621 09:32:57 pts/3 0:00 grep nfsd  

Also use the -t option with rpcinfo to check the TCP connection. If these commands fail, restart the NFS service (see "How to Restart NFS Services" on page 57).
  1. Type the following command to check whether the mountd daemon is running.


  # /usr/bin/rpcinfo -u localhost mountd  
  program 100005 version 1 ready and waiting  
  program 100005 version 2 ready and waiting  
  program 100005 version 3 ready and waiting  
  # ps -ef | grep mountd  
  root   145   1 0      Apr 07 ?     21:57 /usr/lib/autofs/automountd  
  root   234   1 0      Apr 07 ?      0:04 /usr/lib/nfs/mountd  
  root  3084 24621 09:30:20 pts/3 0:00 grep mountd  

Also use the -t option with rpcinfo to check the TCP connection. If these commands fail, restart the NFS service (see "How to Restart NFS Services" on page 57).
  1. Type the following command to check whether the rpcbind daemon is running.


  # /usr/bin/rpcinfo -u localhost rpcbind  
  program 100000 version 1 ready and waiting  
  program 100000 version 2 ready and waiting  
  program 100000 version 3 ready and waiting  

If rpcbind seems to be hung, either reboot the server or follow the steps in "How to Warm Start rpcbind" on page 58.

· How to Restart NFS Services

* To enable daemons without rebooting, become superuser and type the following commands.

  # /etc/init.d/nfs.server stop  
  # /etc/init.d/nfs.server start  

This will stop the daemons and restart them, if there is an entry in /etc/dfs/dfstab.

· How to Warm Start rpcbind

If the NFS server can not be rebooted because of work in progress, it is possible to restart rpcbind without having to restart all of the services which use RPC by completing a warm start as described below.
  1. As root on the server, get the PID for rpcbind. Run ps to get the PID (which will be the value in the second column).


  # ps -ef |grep rpcbind  
      root   115     1  0   May 31 ?        0:14 /usr/sbin/rpcbind  
      root 13000  6944  0 11:11:15 pts/3    0:00 grep rpcbind  

  1. Send a SIGTERM signal to the rpcbind process. In this example, term is the signal that is to be sent and 115 is the PID for the program (see the kill(1) man page). This will cause rpcbind to create a list of the current registered services in /tmp/portmap.file and /tmp/rpcbind.file.


  # kill -s term 115  


Note - If the rpcbind process is not killed with the -s option, then a warm start of rpcbind is not possible.

  1. Restart rpcbind.

    Do a warm restart of the command so that the files created by the kill command are consulted, so that the process resumes without requiring that all of the RPC services be restarted (see the rpcbind(1M)man page).


  # /usr/sbin/rpcbind -w  

Common NFS Error Messages

mount: ... server not responding:RPC_PMAP_FAILURE - RPC_TIMED_OUT

The server sharing the file system you are trying to mount is down or unreachable, at the wrong run level, or its rpcbind is dead or hung.
mount: ... server not responding: RPC_PROG_NOT_REGISTERED

mount registered with rpcbind, but the NFS mount daemon mountd is not registered.
mount: ... No such file or directory

  Either the remote directory or the local directory does not exist. Check the
  spelling of the directory names. Run ls on both directories.

mount: ...: Permission denied
  Your computer name may not be in the list of clients or netgroup allowed
  access to the file system you want to mount. Use showmount -e to verify
  the access list.

NFS server hostname not responding still trying

If programs hang while doing file-related work, your NFS server may be dead. This message indicates that NFS server hostname is down or that there is a problem with the server or with the network. Start with "How to Check Connectivity on a NFS Client" on page 53.
NFS fsstat failed for server hostname: RPC: Authentication error

This error can be caused by many situations. One of most difficult to debug is when this occurs because a user is in too many groups. Currently a user may be in as many as 16 groups but no more if they are accessing files through NFS mounts. If a user must have the functionality of being in more than 16 groups and if Solaris 2.5 is running on the NFS server and the NFS clients, then use ACLs to provide the needed access privileges.