SPARCcluster HA Server Software Planning and Installation Guide
  Search only this book
Download this book in PDF

Verification and Validation

11

After the Solstice HA systems are configured, you must verify and validate the configuration to ensure high availability.
Use the following table to locate specific information in this chapter.
Overview of Taskspage 11-1
Running the hacheck Commandpage 11-2
Running the haswitch Commandpage 11-2
Performing Physical and Manual Testspage 11-3
Performing the Fault Detection Testpage 11-4
Verify HA-ORACLE Setuppage 11-4
Verify SPARCstorage Array Firmwarepage 11-4
Power Sequence Testing a SPARCcluster 1000page 11-5
Power Sequence Testing a SPARCcluster 2000page 11-6

11.1 Overview of Tasks

Verifying and validating the Solstice HA configuration involves:
  • Running the hacheck(1M) command
  • Running the haswitch(1M) command (this is done as part of the physical and manual tests)
  • Performing physical and manual tests
  • Testing the fault detection
  • Verify HA-ORACLE setup
  • Verifying the power sequencing

11.2 Running the hacheck Command

You should run the hacheck command on both servers in the Solstice HA configuration. The command automatically performs the following on both servers in the Solstice HA configuration:
  • Checks that HA-ORACLE is installed properly.
  • Checks and verifies all Solstice HA configuration information.
Run hacheck by entering the following command on both servers:

  host1# hacheck  

Error messages reported by hacheck are documented in Appendix A, "Error Messages."

11.3 Running the haswitch Command

The haswitch command transfers the specified disksets along with the associated data services and logical IP addresses to the specified server.
Run the haswitch command from each side of the Solstice HA configuration. The following example command line associated the diskset named logicalhost1, along with its data services and logical IP addresses with physical host host2.

  host1# haswitch host2 logicalhost1  

11.4 Performing Physical and Manual Tests

You must conduct both physical and manual tests of the Solstice HA configuration to make sure one system will take over if the other fails. You must perform each of the tests listed:
  1. Make sure diskset ownership moves from one server to the other when the power is turned off on the default master.

    Perform the following steps to conduct this test:

    a. Turn off the power on one of the servers.

    You will begin seeing be0 error messages on the console of the sever that remains running.

    b. Verify the other system has taken ownership of the diskset that was mastered by the server you turned off.

    c. Turn the power back on. Let the system reboot. The system will automatically start the membership monitor software. The host then rejoins the configuration.

    d. Perform a switchover (using haswitch) to give ownership of the diskset back to the default master.

    e. Repeat the procedure, by turning off the power to the second server.

  2. Ensure a takeover occurs when one system is halted.

    Perform the following steps to conduct this test:

    a. As root, invoke uadmin(1M) on one host. For example:


  host1# /sbin/uadmin 1 0  
  Program terminated  
  Type help for more information  
  <#0> ok  

b. Make sure the sibling host has taken over the diskset that was mastered by system you halted.
c. Reboot the server. d. Perform a switchover (using haswitch), moving ownership of the diskset back to the default master.
e. Repeat the procedure on the other server.

11.5 Performing the Fault Detection Test

There are several ways to test the fault detection monitor that runs on each system in a Solstice HA configuration.
  1. Disconnect one of the private network connections.

    You can verify that this action is recognized by the Solstice HA software when error messages are displayed on the Solstice HA consoles on each host or in the /var/adm/messages file. This fault does not result in a takeover.

  2. Reconnect the private network connection.

  3. Disconnect all the public network connections on one of the servers.

    It may take several minutes before error messages will be generated and a takeover occurs.

  4. Reconnect the network and wait for the server to reboot.

11.6 Verify HA-ORACLE Setup

Perform the following verification tests to ensure the HA-ORACLE installation was performed correctly.
  • Run sqlplus(1M) to connect from one server to the other. For instance:

  host1# sqlplus user/password@T:logicalhost2:test_db  

  • Run a sample application that connects to the database from a remote system.
  • Make sure that a sample application can access the database, regardless of which Solstice HA server owns the logical hosts.

11.7 Verify SPARCstorage Array Firmware

Perform a verification test to ensure the SPARCstorage Arrays have the latest version of the firmware installed. This test can be performed on either server. It only needs to be run on one of the servers.
For each SPARCstorage Array cn, run the command:

  host1# ssaadm display cn  

Check the output for a line documenting the firmware revision. The firmware must be at Firmware Rev: 2.3 or higher for Solstice HA.
There may be later SPARCstorage Array patches than those shipped with Solstice HA (102432-07). Consult the file /var/sadm/patch/102432-nn/README.102432.nn for instructions about upgrading the firmware and the appropriate firmware revision.

11.8 Power Sequence Testing a SPARCcluster 1000

Before putting your Solstice HA configuration into service you should perform a basic validation for the power cabling and sequencing. In a SPARCcluster 1000 configuration, if both SPARCserver 1000s, the three SPARCstorage Arrays, and the terminal concentrator are all powered through the cabinet power sequencer, your system is vulnerable to outage if your power source fails or if some component internal to the power sequencer fails.
It is only possible for the configuration to recover if the power is cabled correctly. Perform the following test to verify the proper power cable connections.
  1. Turn off the key switch at the front of the cabinet.

  2. Turn off the circuit breaker at the back of the cabinet.

  3. Confirm that both the SPARCserver 1000s, all SPARCstorage Arrays, and the terminal concentrator do not have power.

  4. Turn on the circuit breaker.

  5. Confirm that the terminal concentrator has power even though the key switch is in the off position.

    No other component should have power at this step. It is important for the for the terminal concentrator to remain on even if the key switch is in the off position. This ensures that existing open connections to the terminal concentrator are not interrupted when the key switch is turned off. This also allows the power on messages from the SPARCserver 1000s to be visible on

those connections. If the terminal concentrator is not properly connected there may be up to a 90-second period of time when it is booting that messages are not displayed.
  1. Turn on the key switch.

  2. Confirm that all SPARCstorage Arrays power up as soon as the key switch is turned on.

    All SPARCstorage Arrays should be on the "switched 1" outlets. It is also important that the SPARCstorage Arrays be ready before Solaris begins accessing the device. If the SPARCstorage Arrays are not ready, Solaris cannot access them until the next reboot. The 20-second head start helps ensure that the SPARCstorage Arrays are ready before Solaris attempts to access them.

    The position of the DIAG switch on the SPARCstorage Arrays is critical to ensuring they are ready before the SPARCserver 1000s begin booting the Solaris operating environment. The correct position of the switch is DIAG, which is opposite the DIAG EXT setting.

  3. Confirm that the SPARCserver 1000s power up approximately 20 seconds after the SPARCstorage Arrays.

  4. Ensure that all fans are powered at or before the time the SPARCserver 1000s obtain power.

    Depending on the cabling, the fans may come on earlier in this procedure. If the fans are not running you should shut down the system now and check the power cabling.

11.9 Power Sequence Testing a SPARCcluster 2000

If the SPARCstorage Arrays and the terminal concentrator are powered from the SPARCcenter 2000 power controller, you should perform the power sequence testing in this section.
  1. Turn off the key switch at the front of each SPARCcenter 2000.

  2. Turn off the circuit breakers at the rear of the SPARCcenter 2000 cabinets.

  3. Confirm that there is no power to either SPARCcenter 2000s, the local disks, the SPARCstorage Arrays, or the terminal concentrator.

  1. Turn on the circuit breaker in cabinet that has the two SPARCstorage Arrays and the terminal concentrator.

  2. Verify that the SPARCstorage Arrays, the terminal concentrator, and the SPARCcenter 2000 local disks power up.

    If these do not power up, ensure that the power controller remote/local switch is in the local position.

  3. Turn on the key switch on the front of the cabinet.

    The SPARCcenter 2000 should receive power and begin power-on self testing.It is important for the SPARCstorage Arrays and the terminal concentrator power to remain independent for the SPARCcenter 2000 key switch. This ensures that open connections to the server console ports remain even while the SPARCcenter 2000 is powered down for service.

  4. Turn on the circuit breaker at the rear of the other cabinet.

  5. Verify that the SPARCstorage Array and the SPARCcenter 2000 local disks power up.

    If these do not power up, ensure that the power controller remote/local switch is in the local position.

  6. Turn on the key switch on the front of the other cabinet.

  1. Verify that the SPARCcenter 2000s power up correctly.

    Note - The local disks in the SPARCcenter 2000 configuration remain powered even though the processor chassis has no power. If you will be removing the SCSI cables to the local disks they should be individually powered off using the desktop storage pack power switch. Unless the SPARCcenter 2000 power controller must be serviced, key switch power off should suffice for most service needs such as SBus card replacement.

    In the event of an external power failure, the SPARCstorage Arrays, SPARCcenter 2000s, and terminal concentrator will all obtain power simultaneously. In this configuration no head start is afforded the SPARCstorage Array before the SPARCcenter 2000s begin to boot. The position of the DIAG switch on the SPARCstorage Array is critical to ensuring the SPARCstorage Array is ready before the SPARCcenter 2000s are booting the Solaris operating environment. The correct position of this switch is DIAG, which is the opposite of DIAG EXT switch.