Solstice DiskSuite 4.0 Administration Guide
只搜寻这本书
以 PDF 格式下载本书

Recovery From Failed Boots

C

Introduction

Because DiskSuite enables you to mirror root, swap, and /usr, special problems can arise when you are booting your system, either if a hardware failure occurs or through operator error. This appendix presents examples of such problems and provides possible solutions.

Improper /etc/vfstab Entries

A common problem that prevents the system from booting is failing to make proper entries in the /etc/vfstab file. While this problem may seem disastrous, the solution is actually fairly simple. The following example shows how you can edit the /etc/vfstab file to recover from a failed boot.
If you have failed to make the proper entry in the /etc/vfstab file when mirroring root, the machine will appear at first to be booting properly. In the following example, root is mirrored with a two-way mirror. The root entry in /etc/vfstab has somehow reverted back to the original component of the file system, but the information in /etc/system still shows booting to be off of a metadevice. The most likely reason for this to occur is that the metaroot command was not used to maintain /etc/system and /etc/vfstab, or an old copy of /etc/vfstab was copied back.
To remedy this situation, you need to edit /etc/vfstab while in single-user mode.
The incorrect /etc/vfstab file would look something like the following:

  #device                            device                             mount       FS    fsck   mount  mount  
  #to mount                         to fsck                            point         type pass at boot  
  options  
  #  
  /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 /       ufs  1    no      --  
  /dev/dsk/c0t3d0s1 --                  --       swap --    no      --  
  /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr    ufs  2    no      --  
  #  
  /proc             --                  /proc   proc --    no      --  
  fd                --                  /dev/fd fd   --    no      --  
  swap              --                  /tmp    tmpfs--    yes     --  

Because of the errors, you automatically go into single-user mode when the machine is booted:

  ok boot  
  Booting from: sd(0,0,0)  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  
  Copyright (c) 1983-1992, Sun Microsystems, Inc.  
  ...  
  Hostname: demo  
  dump on /dev/dsk/c0t3d0s1 size 34816K  
  mount: /dev/dsk/c0t3d0s0 is not this fstype.  
  setmnt: Cannot open /etc/mnttab for writing  
  
  INIT: Cannot create /var/adm/utmp or /var/adm/utmpx  
  
  INIT: failed write of utmpx entry:"  "  
  
  INIT: failed write of utmpx entry:"  "  
  
  INIT: SINGLE USER MODE  
  
  Type Ctrl-d to proceed with normal startup,  
  (or give root password for system maintenance):  
  Entering System Maintenance Mode  
  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  

At this point, root and /usr are mounted read-only. Follow these steps:
  1. Run fsck and remount root read/write so you can edit the /etc/vfstab file.


Note - Be careful to use the correct metadevice for root.


  # fsck /dev/md/rdsk/d0  
  ** /dev/md/rdsk/d0  
  ** Currently Mounted on /  
  ** Phase 1 - Check Blocks and Sizes  
  ** Phase 2 - Check Pathnames  
  ** Phase 3 - Check Connectivity  
  ** Phase 4 - Check Reference Counts  
  ** Phase 5 - Check Cyl groups  
  2274 files, 11815 used, 10302 free (158 frags, 1268 blocks, 0.7%  
  fragmentation)  
  
  # mount -o rw,remount /dev/md/dsk/d0 /  
  mount: warning: cannot lock temp file </etc/.mnt.lock>  

  1. Edit the /etc/vfstab file to contain the correct metadevice entries.


  # vi /etc/vfstab  

The root entry in the /etc/vfstab file should be edited to appear as follows:

  #device                            device                             mount       FS    fsck   mount  mount  
  #to mount                         to fsck                            point         type pass at boot  
  options  
  #  
  /dev/md/dsk/d0    /dev/md/rdsk/d0    /       ufs  1    no      --  
  /dev/dsk/c0t3d0s1 --                  --       swap --    no      --  
  /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr    ufs  2    no      --  
  #  
  /proc             --                  /proc   proc --    no      --  
  fd                --                  /dev/fd fd   --    no      --  
  swap              --                  /tmp    tmpfs--    yes     --  

  1. Reboot the machine with the reboot command.


  # reboot  

Stale Metadevice Database

In this example, a disk which contains half of the database replicas and submirrors for root, swap, and /usr fails. Because half of the replicas are missing, the system cannot be rebooted.
Solving this problem involves::
  1. Deleting the stale replicas and reboot

  2. Repairing the disk

  3. Adding back the database replicas

  4. Re-enabling the broken submirrors

To remedy the example situation, you would perform the following steps:
  1. Boot the machine to determine which replicas are down.


  ok boot  
  Booting from: sd(0,0,0)  
  ...  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  
  Copyright (c) 1983-1992, Sun Microsystems, Inc.  
  ...  
  WARNING: md: State database is stale  
  ...  
  metainit: stale databases  
  Insufficient metadevice database replicas  
  located. Use metadb to delete databases which  
  are no longer in existence. Exit the shell  
  when done to continue the boot process.  
  
  Type Ctrl-d to proceed with normal startup,  
  (or give root password for system maintenance):  
  Entering System Maintenance Mode  
  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  

  1. Use the metadb command to look at the database.


  # /usr/opt/SUNWmd/sbin/metadb -i  
     flags      first blk      block count  
      a m  p  lu    16                1034                  /dev/dsk/c0t3d0s3  
      a   p  l      1050              1034                  /dev/dsk/c0t3d0s3  
      M  p        unknown      unknown                /dev/dsk/c1t2d0s3  
      M  p        unknown      unknown                /dev/dsk/c1t2d0s3  
   o - replica active prior to last mddb configuration change  
   u - replica is up to date  
   l - locator for this replica was read successfully  
   c - replica's location was in /etc/opt/SUNWmd/mddb.cf  
   p - replica's location was patched in kernel  
   m - replica is master, this is replica selected as input  
   W - replica has device write errors  
   a - replica is active, commits are occurring to this replica  
   M - replica had problem with master blocks  
   D - replica had problem with data blocks  
   F - replica had format problems  
   S - replica is too small to hold current data base  
   R - replica had device read errors  

  1. Delete the stale database replicas using the -d option to the metadb command.

    Since, at this point, the root file system is read-only, ignore the mddb.cf error messages:


  # /usr/opt/SUNWmd/sbin/metadb -d /dev/dsk/c1t2d0s3  
  metadb: stale databases  
  metadb: /etc/opt/SUNWmd/mddb.cf.new: Read-only file system  
  metadb: databases installed but kernel not patched  
          and new mddb.cf file not generated  
  metadb: could not open temp mddb.cf  
  Usage:  metadb [-s setname] -a [options] mddbnnn  
          metadb [-s setname] -a [options] device ...  
          metadb [-s setname] -d [options] mddbnnn  
          metadb [-s setname] -d [options] device ...  
          metadb [-s setname] -i  
          metadb -p [options] [ mddb.cf-file ]  
  options:  
  -c count       number of replicas (for use with -a only)  
  -f             force adding or deleting of replicas  
  -k filename    alternate /etc/system file  
  -l length      specify size of replica (for use with -a only)  
  # /usr/opt/SUNWmd/sbin/metadb  
      flags        first blk       block count  
       a m  p  lu         16               1034            /dev/dsk/c0t3d0s3  
       a    p  l          1050            1034            /dev/dsk/c0t3d0s3  

  1. Reboot the system.


  # reboot  
  rebooting...  

  1. Once you have a replacement disk, halt the system, replace the failed disk, and once again, reboot the system. Use the format command to partition the disk as it was before the failure.


  # halt  
  ...  
  boot  
  ...  
  # format /dev/rdsk/c1t2d0s0  
  ...  

  1. Use the metadb command to add back the database replicas and to determine that the replicas are correct.


  # /usr/opt/SUNWmd/sbin/metadb -c 2 -a /dev/dsk/c1t2d0s3  
  # /usr/opt/SUNWmd/sbin/metadb  
     flags        first blk  block count  
    a m  p  luo     16            1034                    /dev/dsk/c0t3d0s3  
    a    p  luo     1050          1034                    /dev/dsk/c0t3d0s3  
    a       u       16           1034                   /dev/dsk/c1t2d0s3  
    a       u       1050         1034                   /dev/dsk/c1t2d0s3  

  1. Use the metareplace command to re-enable the submirrors.


  # /usr/opt/SUNWmd/sbin/metareplace -e d0 /dev/dsk/c1t2d0s0  
  Device /dev/dsk/c1t2d0s0 is enabled  
  
  # /usr/opt/SUNWmd/sbin/metareplace -e d1 /dev/dsk/c1t2d0s1  
  Device /dev/dsk/c1t2d0s1 is enabled  
  
  # /usr/opt/SUNWmd/sbin/metareplace -e d2 /dev/dsk/c1t2d0s6  
  Device /dev/dsk/c1t2d0s6 is enabled  

The submirrors will now resync.

Boot Device Fails

If your boot device fails, you'll need to set up an alternate boot device. In the following example, the boot device containing two of the six database replicas and the root, swap, and /usr submirrors fails. The basic procedure is to repair the disk, boot from another root submirror, and then restore the database and mirrors to their original state.
Initially, when the boot device fails, you'll see a message similar to the following. This message may differ among various architectures.

  Booting from: sd(0,0,0)/kernel/unix  
  The selected SCSI device is not responding  
  Can't open boot device  
  ...  

When you see this message, it would be a very good idea to make a note of the device. Then, follow these steps:
  1. Boot from another root submirror.

    Since only two of the six database replicas in this example are in error, you can still boot. If this were not the case, you would need to delete the database replicas in single-user mode. This procedure is described in "Stale Metadevice Database" on page 234.


Note - Having database replicas on at least three disks would make this procedure unnecessary in the case of a single disk failure.


  ok boot sd(0,2,0)  
  Booting from: sd(0,2,0)  
  ...  
  Copyright (c) 1983-1992, Sun Microsystems, Inc.  
  Hostname: demo  
  ...  
  demo console login: root  
  Password:  
  Last login: Wed Dec 16 13:15:42 on console  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  
  ...  
  #  

  1. Use the metadb command to determine that two database replicas have failed.


  # /usr/opt/SUNWmd/sbin/metadb  
         flags         first blk    block count  
      M     p          unknown      unknown      /dev/dsk/c0t3d0s3  
      M     p          unknown      unknown      /dev/dsk/c0t3d0s3  
      a m  p  luo      16           1034         /dev/dsk/c0t2d0s3  
      a    p  luo      1050         1034         /dev/dsk/c0t2d0s3  
      a    p  luo      16           1034         /dev/dsk/c0t1d0s3  
      a    p  luo      1050         1034         /dev/dsk/c0t1d0s3  

  1. Use the metastat command to determine that half of the root, swap, and /usr mirrors have failed.


  # /usr/opt/SUNWmd/sbin/metastat  
  d0: Mirror  
      Submirror 0: d10  
        State: Needs maintenance  
      Submirror 1: d20  
        State: Okay  
      Pass: 1  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 47628 blocks  
  
  d10: Submirror of d0  
      State: Needs maintenance  
      Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>"  
      Size: 47628 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s0          0     No    Maintenance  
  
  d20: Submirror of d0  
      State: Okay  
      Size: 47628 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s0          0     No    Okay  


  (continued from previous page)  
  
  d1: Mirror  
      Submirror 0: d11  
        State: Needs maintenance  
      Submirror 1: d21  
        State: Okay  
      Pass: 2  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 69660 blocks  
  
  d11: Submirror of d1  
      State: Needs maintenance  
      Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>"  
      Size: 69660 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s1          0     No    Maintenance  
  
  d21: Submirror of d1  
      State: Okay  
      Size: 69660 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s1          0     No    Okay  
  
  d2: Mirror  
      Submirror 0: d12  
        State: Needs maintenance  
      Submirror 1: d22  
        State: Okay  
      Pass: 3  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 286740 blocks  


  (continued from previous page)  
  
  d2: Mirror  
      Submirror 0: d12  
        State: Needs maintenance  
      Submirror 1: d22  
        State: Okay  
      Pass: 3  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 286740 blocks  
  
  d12: Submirror of d2  
      State: Needs maintenance  
      Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>"  
      Size: 286740 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s6          0     No    Maintenance  
  
  d22: Submirror of d2  
      State: Okay  
      Size: 286740 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s6          0     No    Okay  

  1. Halt the system, repair the disk, and reboot.

    Note that you must reboot from the other half of the root mirror.


  # halt  
  ...  
  Halted  
  ...  
  ok boot sd(0,2,0)  
  Booting from: sd(0,2,0)  
  ...  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  
  Copyright (c) 1983-1992, Sun Microsystems, Inc.  
  ...  
  Hostname: demo  
  The system is coming up.  Please wait.  
  ...  
  The system is ready.  
  
  demo console login: root  
  Password:  
  Last login: Wed Dec 16 13:36:29 on console  
  SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]  
  #  

  1. Use the metadb command to delete the failed replicas and then add them back.


  # /usr/opt/SUNWmd/sbin/metadb  
         flags         first blk    block count  
      M     p          unknown      unknown      /dev/dsk/c0t3d0s3  
      M     p          unknown      unknown      /dev/dsk/c0t3d0s3  
      a m  p  luo      16           1034         /dev/dsk/c0t2d0s3  
      a    p  luo      1050         1034         /dev/dsk/c0t2d0s3  
      a    p  luo      16           1034         /dev/dsk/c0t1d0s3  
      a    p  luo      1050         1034         /dev/dsk/c0t1d0s3  
  # /usr/optSUNWmd/sbin/metadb -d c0t3d0s3  
  # /usr/optSUNWmd/sbin/metadb -c2 -a /dev/dsk/c0t3d0s3  
  # /usr/optSUNWmd/sbin/metadb  
         flags         first blk    block count  
       a        u      16           1034         /dev/dsk/c0t3d0s3  
       a        u      1050         1034         /dev/dsk/c0t3d0s3  
       a m  p  luo     16           1034         /dev/dsk/c0t2d0s3  
       a    p  luo     1050         1034         /dev/dsk/c0t2d0s3  
       a    p  luo     16           1034         /dev/dsk/c0t1d0s3  
       a    p  luo     1050         1034         /dev/dsk/c0t1d0s3  

  1. Use the metareplace command to re-enable the submirrors.


  # /usr/opt/SUNWmd/sbin/metareplace -e d0 /dev/dsk/c0t3d0s0  
  Device /dev/dsk/c0t3d0s0 is enabled  
  
  # /usr/opt/SUNWmd/sbin/metareplace -e d1 /dev/dsk/c0t3d0s1  
  Device /dev/dsk/c0t3d0s1 is enabled  
  
  # /usr/opt/SUNWmd/sbin/metareplace -e d2 /dev/dsk/c0t3d0s6  
  Device /dev/dsk/c0t3d0s6 is enabled  

After some time, the resyncs will complete. You can now return to booting from the original device.
At this point, running the metastat command would display the following:

  # /usr/opt/SUNWmd/sbin/metastat  
  d0: Mirror  
      Submirror 0: d10  
        State: Okay  
      Submirror 1: d20  
        State: Okay  
      Pass: 1  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 47628 blocks  
  
  d10: Submirror of d0  
      State: Okay  
      Size: 47628 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s0          0     No    Okay  
  
  d20: Submirror of d0  
      State: Okay  
      Size: 47628 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s0          0     No    Okay  
  
  d1: Mirror  
      Submirror 0: d11  
        State: Okay  
      Submirror 1: d21  
        State: Okay  
      Pass: 2  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 69660 blocks  


  (continued from previous page)  
  
  d11: Submirror of d1  
      State: Okay  
      Size: 69660 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s1          0     No    Okay  
  
  d21: Submirror of d1  
      State: Okay  
      Size: 69660 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s1          0     No    Okay  
  
  d2: Mirror  
      Submirror 0: d12  
        State: Okay  
      Submirror 1: d22  
        State: Okay  
      Pass: 3  
      Read option: roundrobin (default)  
      Write option: parallel (default)  
      Size: 286740 blocks  
  
  d12: Submirror of d2  
      State: Okay  
      Size: 286740 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t3d0s6          0     No    Okay  
  
  d22: Submirror of d2  
      State: Okay  
      Size: 286740 blocks  
      Stripe 0:  
       Device              Start Block  Dbase State        Hot Spare  
       /dev/dsk/c0t2d0s6          0     No    Okay