SPARCcluster HA Server Software Administration Guide
  Search only this book
Download this book in PDF

Preparing for Administration

2

This chapter offers advice about preparing for administration of a Solstice High Availability configuration.
Use the following table to locate specific information in this chapter:
Saving Device Informationpage 2-1
Restoring Device Informationpage 2-3
Recording the path_to_inst Informationpage 2-4
Instance Numberingpage 2-4
Solstice HA Processespage 2-6
Logging Into the Servers as Rootpage 2-8

2.1 Saving Device Information

It is useful to have a record of the disk partitioning you have selected for the disks in your multi-host disksets. This multi-host partitioning information is needed when a disk replacement must be made.
The disk partitioning information for the local disks is not as critical because the local disks on both servers should have been partitioned identically. You can obtain the local disk partitioning from the sibling server in the event a local disk fails.
When a multi-host disk is replaced, the replacement disk must have the same partitioning as the disk it is replacing. Depending on how a disk has failed this information may or may not be available when replacement is performed. This is especially important if you have several different partitioning schemes in your disksets.
A simple way of recording this information is shown in the example script in Figure 2-1. This type of script should be run following Solstice HA configuration. In this example, the files containing the volume table of contents (VTOC) information are written to the /etc/opt/SUNWhadf/vtoc directory by the prtvtoc(1M) command.

  #! /bin/sh  
  DIR=/etc/opt/SUNWhadf/vtoc  
  mkdir -p $DIR  
  cd /dev/rdsk  
  for i in *s7  
  do prtvtoc $i >$DIR/$i || rm $DIR/$i  
  done  

Figure 2-1 Example Script for Saving VTOC Information
The script in Figure 2-1 will work because it is a requirement that each of the disks in a diskset has a slice 7, where the metadevice state database replicas reside.
If a local disk also has a valid slice 7, the VTOC information will also be saved by the example script in Figure 2-1. However, this should not occur for the boot disk, because typically a local disk does not a valid slice 7.

Note - Make certain that the script is run while none of the disks are owned by the sibling host. The script will work if the logical hosts are in maintenance mode (MAINT), the logical hosts owned by the local host, or if Solstice HA is not running.

You should duplicate this information on both servers in the Solstice HA configuration.
It is important this that this information be kept up to date as new disks are added to the disksets and when any of the disks are repartitioned.

2.2 Restoring Device Information

If you have saved the VTOC information for all multi-host disks, this information can be used when a disk is replaced. The example script shown in Figure 2-2 would use the VTOC information saved when you ran the script shown in Figure 2-1 to give the replacement disk the same partitioning as the failed disk.

  #! /bin/sh  
  DIR=/etc/opt/SUNWhadf/vtoc  
  cd /dev/rdsk  
  for i in c1t0s0s7  
  do fmthard -s $DIR/$i $i  
  done  

Figure 2-2 Example Script for Restoring VTOC Information

Note - The replacement drive must be of the same size and geometry (generally the same model from the same manufacturer) as the failed drive. Otherwise the original VTOC may not be appropriate for the replacement drive.

If you have failed to record this VTOC information, but you have mirrored slices on a disk by disk basis (for example, the VTOCs of both sides of the mirror are the same), it is possible to simply copy the VTOC information from the other mirror disk to the replacement disk. An example of how to perform this procedure is shown inFigure 2-3.

  #! /bin/sh  
  cd /dev/rdsk  
  OTHER_MIRROR_DISK=c1t0d0s7  
  REPLACEMENT_DISK=c2t0d0s7  
  prtvtoc $OTHER_MIRROR_DISK | fmthard -s - $REPLACEMENT_DISK  

Figure 2-3 Example Script to Copy VTOC Information From a Mirror
If you failed to save the VTOC information and did not mirror on a disk-by-disk basis, it is possible to examine the component sizes reported by the metastat(1M) command and reverse engineer the VTOC information. Because the computations used in this is a procedure are so complex, the procedure should only be performed by a trained service representative.

2.3 Recording the path_to_inst Information

It is important for you to record the /etc/path_to_inst information on removable media (that is, floppy disk or backup tape). The path_to_inst(4) file contains the minor unit numbers for disks in each SPARCstorage Array. This information will be necessary if the boot disk on either Solstice HA server fails and has to be replaced.

2.4 Instance Numbering

Instance names are occasionally reported in driver error messages. An instance name refers to the system devices such as ssd20 or le5.
You can determine the binding of an instance name to a physical name by looking at /var/adm/messages or dmesg(1M) output:

  ssd20 at SUNW,pln0:  
  ssd20 is /io-unit@f,e0200000/sbi@0,0/SUNW,soc@3,0/SUNW,pln@a0000800,20183777/ssd@4,0  
  
  le5 at lebuffer5:  SBus3 slot 0 0x60000 SBus level 4 sparc ipl 7  
  le5 is /io-unit@f,e3200000/sbi@0,0/lebuffer@0,40000/le@0,60000  

Once an instance name has been assigned to a device, it remains bound to that device.
Instance numbers are encoded in a device's minor number. To keep instance numbers persistent across reboots, the system records them in the /etc/path_to_inst file. This file is read only at boot time, and is currently updated by the add_drv(1M) and drvconfig(1M) commands. For additional information refer to the path_to_inst man page.
When you perform a Solaris installation on a server, instance numbers can change if hardware was added or removed since the last Solaris installation. For this reason, use caution whenever you add or remove SBus cards on Solstice HA servers. Always install an SBus card in the next available empty SBus slot for that type of device.
The following example highlights the instance number problems that can arise in a configuration. In this example, the Solstice HA configuration consists of three SPARCstorage arrays with FC/S cards installed in SBus slots 1, 2, and 4 on each of the servers. The controller numbers are c1, c2, and c3. If the system administrator adds another SPARCstorage array to the configuration using a FC/S card in SBus slot 3, the corresponding controller number will be c4. If Solaris is reinstalled on one of the servers, the controller numbers c3 and c4 will refer to different SPARCstorage Arrays. The other Solstice HA server will still refer to the SPARCstorage Arrays with the original instance numbers. Solstice DiskSuite will not communicate with the disks connected to the c3 and c4 controllers.
Other problems can arise with instance numbering associated with the Ethernet connections. For example, each of the Solstice HA servers has three Ethernet SBus cards installed in slots 1, 2, and 3 and the instance numbers are le1, le2, and le3. If the middle card (le2) is removed and Solaris is reinstalled, the third SBus card will be renamed from le3 to le2.

2.4.1 Reconfiguration Reboots

During some of the administrative procedures documented in this manuals, you are told to perform a reconfiguration reboot. A reconfiguration reboot is performed via the OpenBoot PROM boot -r command or by creating the file /reconfigure on the server and then rebooting.

Note - It is not necessary to perform a reconfiguration reboot to add disks to an existing SPARCstorage Array.

Be certain to avoid performing Solaris reconfiguration reboots when any hardware (especially a SPARCstorage Array or other disk) is not operational (powered off or otherwise defective). In such situations the reconfiguration reboot will remove the inodes in /devices and symbolic links in /dev/dsk and /dev/rdsk associated with the disk devices. These disks will become inaccessible to Solaris until a later reconfiguration reboot. A subsequent
reconfiguration reboot may not restore the original controller minor unit numbering and cause Solstice DiskSuite to reject the disks. When the original numbering is restored, Solstice DiskSuite can access the associated metadevices.
If all hardware is operational a reconfiguration reboot may be safely performed to add a disk controller to a server. Such controllers must be added symmetrically to both servers (though a temporary unbalance is allowed while the servers are upgraded). Similarly, if all hardware is operational it is safe to perform a reconfiguration reboot to remove hardware.

2.5 Solstice HA Processes

The Solstice HA software has several processes running at any time on the two servers.

CAUTION Caution - Never stop or kill(1M) a Solstice HA process unless you are told to do this as part of a maintenance procedure in this manual or in one of the other documents included in the SPARCcluster Binder Set.

The process that are running on the servers include the following:

  /bin/sh /opt/SUNWhadf/bin/haload  
  /bin/sh /opt/SUNWhadf/clust_progs/runclocksync  
  /bin/sh /opt/SUNWhadf/fault_progs/faultdloop  
  /bin/sh /opt/SUNWhadf/fault_progs/net_probe_brother private  
  /bin/sh /opt/SUNWhadf/fault_progs/net_probe_brother public  
  /bin/sh /opt/SUNWhadf/fault_progs/nfs_probe_local_restart /var/opt/SUNWhadf/had  
  /bin/sh /opt/SUNWhadf/fault_progs/nfs_probe_one_common host1 1 0  
  /bin/sh /opt/SUNWhadf/fault_progs/nfs_probe_one_common host2 0 0  
  /bin/sh /opt/SUNWhadf/fault_progs/nfs_probe_remote  
        /var/opt/SUNWhadf/hadf/ha_env FOREIGN  
  /bin/sh /opt/SUNWhadf/fault_progs/nfs_probe_local  
        /var/opt/SUNWhadf/hadf/ha_env NATIVE  
  /opt/SUNWcluster/bin/clustd -f /etc/opt/SUNWhadf/hadf/cmm_confcdb  
  /usr/lib/autofs/automountd  
  /usr/lib/nfs/lockd -g 90  
  /usr/lib/nfs/statd -a host1 -p /host1  
  faultd  
  fdl_load -i 30 -p 90 host1 host2-priv1  
  haclksyn host1-priv1 host2-priv2  
  net_periodic_ping_other 8640000 30 5 120  
        /var/opt/SUNWhadf/hadf/tmp/net_probe_brother.up.992  
        /var/opt/SUNWhadf/hadf/tmp/net_probe_brother.down.992  
  net_periodic_ping_other 8640000 30 5 120  
        /var/opt/SUNWhadf/hadf/tmp/net_probe_brother.up.1135  
        /var/opt/SUNWhadf/hadf/tmp/net_probe_brother.down.1135  
  nfs_mon host1 host2 1 /var/opt/SUNWhadf/hadf/tmp/nfs_mon.cmd.host1.175  
  nfs_mon host2 host1 0 /var/opt/SUNWhadf/hadf/tmp/nfs_mon.cmd.host2.198  
  nfs_monitor_pids -i 10 116 847 862 840 838  
  scsirstd  


CAUTION Caution - Never stop or kill any executable found in the /opt/SUNWhadf tree.

2.6 Logging Into the Servers as Root

If you want to log in to Solstice HA servers as root through a terminal other than the console, you must edit the /etc/default/login file and comment out the following line:

  CONSOLE=/dev/console  

This will allow you to have root logins via rlogin(1), telnet(1), and other programs.