SPARCcluster HA Server Software Administration Guide
この本のみを検索
PDF 文書ファイルをダウンロードする

General Solstice HA Maintenance

9

This chapter gives instructions for general maintenance procedures such as restarting failed servers in Solstice HA configurations.
Use the following table to locate specific information in this chapter.
Forcing a Membership Reconfigurationpage 9-2
Public Network Administrationpage 9-2
Shutting Down Solstice HA Serverspage 9-5
Switching Over Data Servicespage 9-7
Stopping the Membership Monitorpage 9-8
Changing the Time in Solstice HA Configurationspage 9-8
Setting the OpenBoot PROMpage 9-9
Maintenance of the /var File Systempage 9-10
Solstice HA Packages Maintenancepage 9-12
Bringing Up Servers Without Starting Solstice HApage 9-14
Changing the Host Name of a Server or a Logical Hostpage 9-16

9.1 Forcing a Membership Reconfiguration

A membership reconfiguration can be forced by changing ownership of a logical host.
A switchover (using haswitch(1M)) accomplishes this task, however you will be required to perform a second switchover in order to restore the original configuration, that is, have the logical hosts associated with the default masters.
Another method to perform a membership reconfiguration is to use the internal utility, clustm. To perform a membership reconfiguration, enter the following:

  # /etc/SUNWcluster/bin/clustm reconfigure hadf  

9.2 Public Network Administration

Adding and removing public network connections in Solstice HA configurations involves software procedures in addition to making the hardware connections. The instructions for adding a public network can be found in Chapter 5, "Adding Hardware." Use the following procedure to remove a public network.

CAUTION Caution - If you perform an initial install of Solaris in the future, the removal of the network interface SBus cards may cause the numbering of the remaining network to change.

· How to Remove a Public Network

  1. Notify users the subnet is going to be removed.

    Make sure the users are off the subnet.

  2. Remove, or comment out, the appropriate HOSTNAME line in the /etc/opt/SUNWhadf/hadf/hadfconfig file on both Solstice HA servers.

  1. Perform a membership monitor reconfiguration.

    The logical hosts on the associated network will cease offering services following the membership reconfiguration. To perform a membership reconfiguration, enter the following:


  # /etc/SUNWcluster/bin/clustm reconfigure hadf  

  1. On each server, determine which interface and logical interfaces will be removed.

    After the membership reconfiguration, Solstice HA will forget about the network, but does not completely clean up. The ifconfig -a command will report that the associated logical interfaces are still up, as follows:


  host1# ifconfig -a  
  le5: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu  
       1500 inet 192.9.76.12 netmask ffffff00 broadcast 192.9.76.255  
       ether 8:0:20:1c:b2:92  
  le5:1: flags=843<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500  
          inet 192.9.76.18 netmask ffffff00 broadcast 192.9.76.255  
  le5:2: flags=842<BROADCAST,RUNNING,MULTICAST> mtu 1500  
          inet 192.9.77.13 netmask ffffff00 broadcast 192.9.77.255  

  1. Execute the ifconfig down commands on both servers. The following commands must be entered on both servers.


  host1# ifconfig le5:1 down  
  host1# ifconfig le5:2 down  
  host1# ifconfig le5 down  
  host1# ifconfig le5 unplumb  

  1. On each server, remove the /etc/hostname.nnn file that is associated with the interface.

    This step is only necessary if the hardware is removed from both servers.


  host1# rm /etc/hostname.le5*  

  1. Switch ownership of both logical hosts to one Solstice HA server using a command similar to the following:


  host1# haswitch host1 logicalhost1 logicalhost2  

  1. Stop the membership monitor and halt the server that will have the hardware removed first.

    After entering the following commands, turn off the server.


  host2# /etc/init.d/SUNWhadf stop  
  host2# halt  

  1. Remove the hardware from the server that has been halted.

    Use the procedure from the SPARCcluster High Availability Server Service Manual to remove the hardware.

  1. Perform a reconfiguration reboot on the server.


  ok boot -r  

  1. Switch ownership of both logical hosts to the server that has already had the hardware removed.


  host2# haswitch host2 logicalhost1 logicalhost2  

  1. Stop the membership monitor and halt the other server.

    After entering the following commands, turn off the server.


  host1# /etc/init.d/SUNWhadf stop  
  host1# halt  

  1. Remove the hardware from the server that has been halted.

    Use the procedure from the SPARCcluster High Availability Server Service Manual to remove the hardware.

  1. Perform a reconfiguration reboot on the server.


  ok boot -r  

  1. Switch ownership of the logical hosts to the default master.


  host1# haswitch host1 logicalhost1  

9.3 Shutting Down Solstice HA Servers

You may have to shut down one or both Solstice HA servers to perform hardware maintenance such as adding or removing SBus cards. The following sections describe the procedure for shutting down a single server or the entire configuration.

· How to Shut Down One Server

If you want the data in a logical host (diskset) to remain available when a server is shut down, you must first switch ownership of the diskset to the other server using the haswitch command.
If it is not necessary to have the data available, the logical host (diskset) can be placed in maintenance mode. Refer to Section 9.4.1, "Putting Logical Hosts in Maintenance Mode," on page 9-7 for additional information.

Note - It is possible to halt (halt(1M)) a Solstice HA server and allow a takeover to restore the logical host services on the other server. The halt may cause the server to panic. However, the haswitch command offers a more reliable method of switching ownership of the logical hosts.

To stop running Solstice HA on a server while leaving services running on the sibling, enter the following commands:

  host1# haswitch host2 logicalhost1 logicalhost2  
  host1# /etc/init.d/SUNWhadf stop  

At this point you should halt the server.

  host1# halt  

· How to Shut Down a Solstice HA Configuration

You may want to shut down both servers in a Solstice HA configuration should a bad environmental condition exist, such as a cooling failure or a severe lightning storm.
  1. Stop the membership monitor and halt one of the servers.


  host1# /etc/init.d/SUNWhadf stop  
  host1# halt  

  1. Stop the membership monitor and halt the sibling server.


  host2# /etc/init.d/SUNWhadf stop  
  host2# halt  

· How to Halt a Solstice HA Server

Either server can be shut down using either halt or uadmin(1M). If the membership monitor is running when a host is shut down, the server will most likely take a "Fastfail timeout" and display the following message:

  Panic: Failfast timeout unit "abort_thread"  

This can be avoided by stopping the membership monitor before shutting down the server. Refer to Section 9.5, "Stopping the Membership Monitor," on page 9-8 for additional information.

9.4 Switching Over Data Services

You will use the haswitch command to move data services from one Solstice HA server to the other. The command also allows you to put logical hosts in maintenance mode.
For example, to execute a switchover of data services from host1 to host2 (with the logical hosts being named logicalhost1 and logicalhost2), enter the following command:

  host2# haswitch host2 logicalhost1 logicalhost2  

9.4.1 Putting Logical Hosts in Maintenance Mode

To put the disksets of a logical host in maintenance mode, use the -m option of the haswitch command. Maintenance mode is useful for some administration on file systems and disksets.

Note - Unlike other ownership of a logical host, maintenance mode persists across server reboots. A logical host can be removed from maintenance mode only by a manual switchover to a specific Solstice HA server.

An example use of the maintenance option would be:

  # haswitch -m logicalhost1  

This command stops the data services associated with logicalhost1 on the Solstice HA server that currently owns the diskset and also halts the fault monitoring services associated with both Solstice HA servers. The command will also execute an unshare(1M) and umount(1M) of any file systems on the logical host. The associated diskset ownership will be released.
The command may be run on either host, regardless of current ownership of the logical host and diskset.

9.5 Stopping the Membership Monitor

To put the server in any mode other than multi-user, or to halt or reboot the server, you must first stop the Solstice HA membership monitor. You can then use your site's preferred method for further server maintenance.
The membership monitor can be stopped only when no logical hosts are owned by the local Solstice HA server. To stop the membership monitor on one host, run the following commands:

  host1# haswitch host2 logicalhost2  
  host1# /etc/init.d/SUNWhadf stop  

If a logical host is owned by the server when the stop command is run, ownership will be transferred to the other Solstice HA host before the membership monitor is stopped.
If the other Solstice HA server is down, the command will take down the data services in addition to stopping the membership monitor.
To stop the membership monitor on both Solstice HA servers, run the haswitch command and stop the membership monitor on one of the servers. Then run the following command on the second server:

  # /etc/init.d/SUNWhadf stop  

9.6 Changing the Time in Solstice HA Configurations

A simple time synchronization protocol is run on both Solstice HA servers that ensures the clocks stay close to each other. The "window of error" is roughly three seconds. If a failover or switchover occurs during that period, the time stamp difference for HA-NFS clients will go unnoticed.
This synchronization is only within the Solstice HA configuration. No reference is made by the servers to external time standards that may be used at your site. For this reason, the time on the Solstice HA servers may drift out of sync with other hosts on the network.

CAUTION Caution - There is no way for an administrator to adjust the time of the servers in a Solstice HA configuration. Never attempt to perform a time change using either the date(1) or the rdate(1M) commands.

9.7 Setting the OpenBoot PROM

For correct Solstice HA operation, the OpenBoot PROM options on both servers should be set to the factory defaults with the exception of the watchdog-reboot? variable which should be set to true. The default setting for auto-boot?, which is true, should not be changed. These settings ensure that Solstice HA servers will boot upon power up and after a kernel watchdog reset.

Note - Under some circumstances, the Solstice HA software may execute a halt(1M) command on a server rather than a reboot(1M) command. This is ordinarily confined to initial configuration problems. In this case, the server must be manually booted to return to service.

· How to Set the OpenBoot PROM

  1. Boot the Solstice HA server in single user mode or run the eeprom(1M) command.

  1. Run the following commands to set the variables from the OpenBoot PROM. The OpenBoot printenv command was used to check the values.


  ok set-defaults  
  Setting NVRAM parameters to default values.  
  ok setenv watchdog-reboot? true  
  watchdog-reboot?= true  
  ok printenv  
  Parameter Name       Value            Default Value  
  ...  
  sbus-probe-list1     0123             0123  
  sbus-probe-list0     0123             0123  
  fcode-debug?         false            false  
  auto-reboot?         true             true  
  watchdog-reboot?     true             false  
  ...  

Alternatively, the eeprom command can be used to set the OpenBoot PROM variables. For example:

  # eeprom 'auto-boot?'  
  auto-boot?=true  
  # eeprom 'watchdog-reboot?'  
  watchdog-reboot?=false  
  # eeprom 'watchdog-reboot?=true'  
  # eeprom 'watchdog-reboot?'  
  watchdog-reboot?=true  
  #  

9.8 Maintenance of the /var File System

Because Solaris and Solstice HA software error messages are written to the /var/adm/messages file, the /var file system may become full. If this happens when the server is running, the server will continue to run. In most instances, you will not be able to log into the server that has the full /var file system. Should the server go down, Solstice HA will not start and a login will not be possible.
If the server goes down you must reboot in single user mode (boot -s).
If the server reports a full /var file system and continues to run Solstice HA services, follow the steps in the following section. In the following procedure, host1 has a full /var file systems.

· How to Repair a Full /var File System

  1. Perform a switchover.


  host2# haswitch host2 logicalhost1 logicalhost2  

  1. Stop the Solstice HA services.

    If you have an existing shell open to host1, enter the follow:


  host1# /etc/init.d/SUNWhadf stop  

If you do not have a shell open to host1, use the procedure in "How to Enter the OpenBoot PROM on a Solstice HA Server" on page 10-5 to connect to the console and halt the server.
  1. Reboot the server in single user mode.


  ok boot -s  
  INIT: SINGLE USER MODE  
  
  Type Ctrl-d to proceed with normal startup,  
  (or give root password for system maintenance): root_password  
  Entering System Maintenance Mode  
  #  

  1. When the server boots, locate and remove or copy the offending file.

    Removing or copying the /var/adm/messages file to another location will not alter the Solstice HA performance.


  # find /var -size +20480 -print  
  /var/adm/messages  
  # rm /var/adm/messages  

Alternatively you can search for files that have been recently modified. The following command shows all the files modified in the past 24 hours.

  # find /var -mtime -1 -print  
  /var/adm/messages  
  /var/adm/utmp  
  /var/adm/utmpx  
  ...  
  # rm /var/adm/messages  

  1. Enter multi-user mode.

    When you enter the following, the server will come up multi-user mode and will automatically rejoin the configuration.


  # exit  

9.9 Solstice HA Packages Maintenance

The only Solstice HA or Solstice DiskSuite packages that can safely be removed are the AnswerBook documents. By default, the AnswerBooks for Solstice DiskSuite reside in /opt/SUNWabmd and the AnswerBooks for Solstice HA reside in /opt/SUNWabha. To remove these packages use the procedure in "How to Remove Solstice HA Packages."
If new distributions of the Solstice HA packages arrive, you can upgrade the servers using the procedure in "How to Upgrade Solstice HA Packages."

· How to Remove Solstice HA Packages

  1. Remove the packages from each of the Solstice HA servers by using the

    pkgrm(1M) command on each server.


  host1# pkgrm SUNWabha SUNWabmd  


  host2# pkgrm SUNWabha SUNWabmd  

· How to Upgrade Solstice HA Packages

  1. Switch ownership of both logical hosts to the Solstice HA server that will not be upgraded first.

    In this example, host2 will be the first one upgraded with new packages, so host1 is taking ownership of both logical hosts.


  host2# haswitch host1 logicalhost1 logicalhost2  

  1. Stop the membership monitor on host2.


  host2# /etc/init.d SUNWhadf stop  
  ...  

  1. Remove the existing packages by using the pkgrm command. It does not matter in which order the packages are removed.


  host2# pkgrm SUNWhaor SUNWhanfs SUNWhagen SUNWcmm SUNWff ...  

  1. Insert the CD that contains the new software into the CD-ROM drive and change directories to root.

    There is a brief delay while Solaris scans the CD and mounts the proper file systems.


  host2# cd /  

  1. Enter the following command to install the three Solstice DiskSuite packages. Enter y at any prompts about changing modes on directories.


  host2# pkgadd -d /cdrom/cdrom0 SUNWhagen SUNWhanfs ...  

  1. Check the contents of the packages.


  host2# pkgchk -n SUNWhagen SUNWhanfs ...  

  1. Start the membership monitor.


  host2# /etc/init.d/SUNWhadf start  

  1. Switch ownership of both logical hosts to the host that has just been upgraded.


  host2# haswitch host2 logicalhost1 logicalhost2  

  1. Repeat the upgrade procedure shown in Step 2 through Step 7.

  1. Switch ownership of the logical hosts back to the appropriate default master.

    For example:


  host1# haswitch host1 logicalhost1  

9.10 Bringing Up Servers Without Starting Solstice HA

You may need to bring up a server without starting the Solstice HA software. One reason you would need to start a server without running Solstice HA is if the vfstab.logicalhost file is lost or becomes corrupt. This is possible because the Solstice HA software is started at run level 3.

· How to Bring Up Servers Without Starting Solstice HA

  1. Boot the server to single-user mode.


  # boot -s  
  ...  
  INIT: SINGLE USER MODE  
  
  Type Ctrl-d to proceed with normal startup,  
  (or give root password for system maintenance): root_password  
  Entering System Maintenance Mode  
  
  #  

  1. Bring the server up to run level 2 by using the init(1M) command. Run level 2 has all normal file systems mounted and non-Solstice HA network services started.


  # init 2  
  INIT: New run level: 2  
  The system is coming up.  Please wait.  
  checking ufs filesystems  
  /dev/rdsk/c0t0d0s7: is stable.  
  /dev/rdsk/c0t0d0s5: 665 files, 73552 used, 111295 free  
  /dev/rdsk/c0t0d0s5: (199 frags, 13887 blocks, 0.1% fragmentation)  
  NIS domainname is host1.West.COM  
  starting router discovery.  
  starting rpc services: rpcbind keyserv ypbind kerbd done.  
  Setting netmask of lo0 to 255.0.0.0  
  Setting netmask of be0 to 255.255.255.0  
  Setting netmask of be1 to 255.255.255.0  
  Setting netmask of le0 to 255.255.255.0  
  Setting default interface for multicast: add net 555.0.0.0:  
  gateway host1-drag  
  syslog service starting.  
  volume management starting.  
  The system is ready.  
  soc0: port 0: Fibre Channel is ONLINE  
  soc1: port 0: Fibre Channel is ONLINE  
  soc2: port 0: Fibre Channel is ONLINE  
  
  console login:  

  1. After you perform the desired maintenance procedure, start Solstice HA by rebooting the server.

    If the server is brought back up using init 3, some daemons are restarted and many system error messages appear. These are avoided with the following command:


  # reboot  

9.11 Changing the Host Name of a Server or a Logical Host

Changing the host name of a server in a Solstice HA configuration is a complex procedure. This procedure should only be performed by a trained service representative.
Renaming a logical host is not possible.