SPARCcluster HA Server Software Planning and Installation Guide
只搜寻这本书
以 PDF 格式下载本书

Introduction

1

Solstice High Availability 1.0 (Solstice HA) is a software product that provides fault-tolerant support and automatic data service failover for specific dual-server hardware configurations. The configurations will recover from server, disk, and network failures, as well as software failures.
Solstice HA uses Solstice DiskSuite 4.0 software to provide the diskset, mirroring, concatenation, striping, hot spare disks, file system growing, and UNIX file system logging capabilities.
Solstice HA allows configurations to be either symmetric or asymmetric. In the asymmetric configuration, one of the systems acts as a hot standby for the other system. In symmetric configurations, both servers can be actively offering data services.
The data services provided by Solstice High Availability are:
  • Highly available NFS(R), Sun's distributed computing file system (Solstice High Availability/HA-NFS)
  • Highly available ORACLE(R) database management system (Solstice High Availability/HA-ORACLE)

1.1 Features Provided

The key features of the Solstice HA software package include:
  • Tolerance of single-point software failures or crashes
  • Tolerance of single-point hardware faults
  • Appearance of continuous availability of data service (NFS clients will only see "Server Not Responding" messages during a takeover)
  • Automatic detection of system and data service failure
  • Automatic takeover, recovery, and service restoration
  • Automatic post-takeover redirection of HA-NFS clients
  • Automatic restart of fault monitoring on a server after it is repaired
Additionally, Solstice HA provides online serviceability which enables administrators to take an appropriately configured server offline for repair or routine maintenance while the data services remain available from the other server in the configuration.

1.2 Data Protection Provided

The disks holding the data exported by the data services are multi-host and mirrored. This allows Solstice HA configurations to tolerate the following types of single-point failures:
  • Server operating system failure because of either a crash or a panic
  • Data service application failure
  • Server hardware failure
  • Network interface failure
  • Disk media failure

1.3 Supported Hardware

Several hardware configurations are supported by Solstice HA. The supported servers are the SPARCserver(TM) 1000 and SPARCcenter(TM) 2000. These two types of servers are configured with SPARCstorage(TM) Arrays Model 100 Series.
Each SPARCstorage Array can contain up to 30 SCSI disks. Both 1.05-Gbyte and 2.1-Gbyte disks can be used in SPARCstorage Arrays. Table 1-1 shows the minimum and maximum number of SPARCstorage Arrays supported in the SPARCserver 1000 and SPARCcenter 2000 configurations.
Table 1-1
Servers Used (2 each)Minimum Number of SPARCstorage ArraysMaximum Number of SPARCstorage Arrays
SPARCserver 100038
SPARCcenter 2000320
Each SPARCserver 1000 must have a minimum of four CPUs (two in each of the two System Boards). Each SPARCcenter 2000 must have a minimum of six CPUs (two in each of three System Boards).

1.4 Getting Help

If you have problems installing or using Solstice HA, call the distributor from which you purchased the software and hardware and provide the following information:
  • Your name and electronic mail address (if available)
  • Your company name, address, and phone number
  • The model and serial numbers of your systems (factory assembled configurations come with the same serial number for each server)
  • The release number of the operating system (for example, Solaris 2.4)
  • Any information that will help diagnose the problem

1.5 Glossary

The terminology used in this manual includes the following:
Asymmetric configuration - A configuration that contains a single diskset. In an asymmetric configuration, one server acts as the default master of the diskset and the other server acts as a hot standby.
Concatenation - A metadevice created by sequentially mapping blocks on several physical slices (partitions) to a logical device. Two or more physical components can be concatenated. The slices are accessed sequentially rather than interlaced (as with stripes).
Data service - A network service that implements read-write access to disk-based data from clients on a network. Examples of data services include NFS and ORACLE*SERVER. The data service may be composed of multiple processes that work together.
Default master - The server that is master of a diskset if both servers rebooted simultaneously. The default master is specified when the system is initially configured.
Diskset - A group of disks that move as a unit between the two servers in a Solstice HA configuration.
DiskSuite - See Solstice DiskSuite.
Fault detection - Solstice HA provides programs that detect two types of failures. The first type includes low-level failures such as system panics and hardware faults (that is, failures that cause the entire server to be inoperable). These failures can be detected quickly. The second type of failures are related to data service, such as HA-ORACLE or HA-NFS. These types of failures take longer to detect.
HA-NFS - Highly available NFS. HA-NFS provides highly available remote mount service, status monitor service, and network locking service.
HA-ORACLE - HA-DBMS for ORACLE*SERVER.
Heartbeat - A periodic message sent between the two membership monitors to each other. Lack of a heartbeat after a specified interval and number of retries may trigger a takeover.
Highly available data service - A data service that appears to remain continuously available, despite single-point failures of server hardware or software components.
Hot standby - In an asymmetric (single diskset) configuration, the machine that is not the default master of the diskset. If both servers reboot simultaneously, the hot standby will not master the diskset and thus will not be running any Solstice HA data services.
Local disks - Disks attached to a single Solstice HA server. The local disks contain the Solaris 2.4 distribution and the Solstice HA and DiskSuite software packages. Local disks must not contain data exported by any Solstice HA data service.
Logical host - A diskset and its collection of logical host names and their associated IP addresses. Each logical host is mastered by one physical host at a time.
Logical host name - The name assigned to one of the logical network interfaces. A logical host name is used by clients on the network to refer to the location of data and data services. The logical host name is the name for a path to the logical host. Because a host may be on multiple networks, there may be multiple logical host names for a single logical host.
Logical network interface - In the Internet architecture, a host may have one or more IP addresses. Solstice HA configures up additional logical network interfaces to establish a mapping between several logical network interfaces and a single physical network interface. This allows a single physical network interface to respond to multiple logical network interfaces. This also enables the IP address to move from one Solstice HA server to the other in the event of a takeover or haswitch(1M), without requiring additional hardware interfaces.
Master - The server with exclusive read and write access to a diskset. The current master host for the logical host runs the data service and has the logical IP addresses mapped to its Ethernet address.
Membership monitor - A process running on both Solstice HA servers that monitors the servers. The membership monitor sends and receives heartbeats to its sibling host. The monitor can initiate a takeover if the heartbeat stops. It also keeps track of which servers are active.
Metadevice - A group of components accessed as a single logical device by concatenating, striping, mirroring, or logging the physical devices. Metadevices are sometimes called pseudo devices.
Metadevice state database - Information kept in nonvolatile storage (on disk) for preserving the state and configuration of metadevices.
Metadevice state database replica - A copy of the state database. Keeping multiple copies of the state database protects against the loss of state and configuration information. This information is critical to all metadevice operations.
Mirroring - Replicating all writes made to a single logical device (the mirror) to multiple devices (the submirrors), while distributing read operations. This provides data redundancy in the event of a failure.
Multi-host disk - A disk configured for potential accessibility from multiple servers via multiple ports. Solstice HA software enables data on a multi-host disk to be exported to network clients via a highly available data service.
Sibling host - One of the two physical servers in a Solstice HA configuration.
Solstice HA - See Solstice High Availability.
Solstice High Availability - A software package that enables two machines to act as read-write data servers while acting as backups for each other.
Solstice DiskSuite - A software product that provides data reliability through disk striping, concatenation, mirroring, UFS logging, dynamic growth of metadevices and file systems, and metadevice state database replicas.
Stripe - Similar to concatenation, except the addressing of the component blocks is interlaced on the slices (partitions), rather than placed sequentially. Striping is used to gain performance. By striping data across disks on separate controllers, multiple controllers can access data simultaneously.
Submirror - A metadevice that is part of a mirror. See also mirroring.
Switchover - The coordinated moving of a logical host (diskset) from one operational Solstice HA server to the other. A switchover is initiated by an administrator using the haswitch(1M) command.
Symmetric configuration - A Solstice HA configuration that contains two disksets. In a symmetric configuration, each server is the default master for one diskset.
Takeover - The automatic moving of a logical host from one Solstice HA server to the other after a failure has been detected. The Solstice HA server that has the failure is forced to give up mastery of the logical host.
Trans device - A pseudo device responsible for managing the contents of a UFS log.
UFS - An acronym for the UNIX file system.
UFS logging - Recording UFS updates to a log (the logging device) before the updates are applied to the UFS (the master device).
UFS logging device - The component of a trans device that contains the UFS log.
UFS master device - The component of a trans device that contains the UFS file system.