Contenues dans
Trouver plus de documentation
Ressources d'assistance comprises
| Télécharger cet ouvrage au format PDF
Troubleshooting Overview
2
2.1 SPARCcluster Architecture
- A SPARCcluster HA Server is comprised of redundant, on-line components, which can continue system operation through failure, repair, and relocation of one assembly or device. To maintain a high level of availability, failed components should be replaced as soon as possible. Also, service precautions must be taken to maintain cluster operation while maintenance is being accomplished. See the section on "Maintenance Authorization."
2.2 Maintenance Authorization
- The site system administrator must be contacted to prepare a node for maintenance and, after maintenance, to return the node to cluster membership. The procedures in this manual note points where the system administrator must be contacted. However, the equipment owner's administrative requirements supercede the procedures contained herein.
2.3 Troubleshooting a Remote Site
- Use telnet to communicate with either node in a cluster.
2.4 Troubleshooting Flow
2.4.1 Takeover
- The Solstice HA software allows one node to takeover when a critical hardware or software failure is detected. When a failure is detected, an error message is generated to the system console and, if required, the service provider is notified (depending upon the system maintenance contract). When a takeover occurs, the node assuming control becomes the I/O master for the disksets on the failed node and redirects the clients of the failed node to itself. The troubleshooting flow for a takeover is further depicted in Figure 2-1.
2.4.2 Switchover
- Administrators can manually direct one system to takeover the data services for the other node. This is referred to as a switchover (refer to the SPARCcluster High Availability Server Software Administration Guide).
2.4.3 Failures Where There is no Takeover
- For noncritical failures, there is no software takeover. However to continue to provide HA data services, troubleshooting should be accomplished in the following order:
-
Warning - DO NOT plug a keyboard directly to a node system board. If a keyboard is plugged into a system board, it then becomes the default for console input, thus preventing input from the system administration workstation/terminal concentrator serial port. In addition, plugging a keyboard directly into a node system board while power is applied to the node sends a break signal to the Solaris operating system, just as if you had typed L1-A on the console.
-
- You will be contacted by the system administrator to replace a defective part, or to further isolate a system defect to a failed part.
- Request that the system administrator prepare the applicable assembly containing the defective part for service.
- Isolate fault to the smallest replaceable part.
-
- Shut down specific assembly containing defective part.
- Replace the defective part.
- Contact system administrator to return the repaired assembly to the cluster.

Figure 2-1
2.5 Fault Classes/Principal Assemblies
- SPARCcluster HA troubleshooting is dependent on several different principal assemblies and classes of faults. The fault classes and their associated assemblies are:
-
- SPARCstorage Array faults
· Data disk
· Controller
· Fibre Channel Optical Module
· Fibre Channel SBus card
· Fiber optic cables/interfaces
- Node (SPARCcenter 2000 or SPARCserver 1000) faults
· Boot disk
· System board
· Control board
· Fibre Channel Optical Module
· Fibre Channel SBus card
· Fiber optic cables/interfaces
· Client net Sbus card
· Client net/connections
· SunFastEthernet SBus card/interfaces (SunFastEthernet)
- Terminal concentrator/serial connections faults
- Software faults
· Application program died
· System crash (panic)
· Hung system (lock up)
· Cluster-wide failures
- All troubleshooting begins at the system console. The console should be checked regularly, as should any other source of operator information. For example, the output of the hastat command should be checked regularly. For more information on the hastat command, refer to the SPARCcluster High Availability Software Administration Guide.
Error Messages/Symptoms
-
Table 2-1 lists error messages or symptoms together with the probable cause and troubleshooting reference.
-
Table 2-1 Error Message/Symptom
-
Troubleshooting
-
Probable Cause Cluster Service Reference
-
Reference
-
Nodes
- eboots;.........SPARCcenter
-
Section 3.3, "Node Failures..SPARCcenter
- e;..........2000 or
-
2000/SPARCserver 1000
- esponse..SPARCserver
-
System Service Manual
- 1000
-
Private Net
- eboots;.........SPARCcenter
-
Section 3.3, "Node Failures..SPARCcenter
- e;..........2000 or
-
2000/SPARCserver 1000
- esponse..SPARCserver
-
System Service Manual
- 1000
- SunFastEthernet Section 3.4.1, "Private Net
-
SunFastEthernet Adapter
-
Failure (SunFastEthernet)"..User Guide
-
Client Network
- Client net...Refer to your client network
- As applicable
- documentation
-
Public Network
- Cable....(See Chapters 9 and 10 of the
- Not applicable
- SPARCcluster HA Hardware
- Planning and Installatin Manual for cable detail.)
-
2.7 Device to Troubleshooting Cross Reference
-
Table 2-2 references devices to the appropriate troubleshooting manual.
-
Table 2-2
| Device/Trouble Area | Reference | Part Number |
Array Controller/Fibre Optic Connector/
Fibre Channel Optical Module | SPARCstorage Array Model 100 Series Service Manual
(Chapter 2 "Troubleshooting") | 801-2206 |
| Disk drive | SPARCstorage Array Model 100 Series Service Manual | 801-2206 |
| Terminal concentrator | SPARCcluster HA Server Service Manual (Section 3.5, "Terminal Concentrator/Serial Connection Faults") | 802-3512 |
| SPARCcenter 2000 | SPARCcenter 2000 Service Manual (Chapter 2, "Troubleshooting Overview") | 801-2007 |
| SPARCserver 1000 | SPARCserver 1000 System Service Manual (Chapter 2, "Troubleshooting Overview") | 801-2895 |
| SunFastEthernet adapter | SunFastEthernet Adapter User Guide (Appendix C, "Running Diagnostics") | 801-6109 |
2.8 Device Replacement Cross Reference
-
Table 2-3 refers to devices and replacement procedures.
-
Table 2-3
| Device | Cross Reference | Part Number |
|
| SPARCserver 1000 | SPARCcenter 2000 |
| Disk drive | Disk Drive Installation Manual for the SPARCstorage Array | 801-2207 | 801-2207 |
| Optical Module | Fibre Channel Optical Module Installation Manual | 801-6326 | 801-6326 |
| SunFastEthernet | SunFastEthernet Adapter User Guide | 801-6109 | 801-6109 |
| System board, control board, power supply, SPARC module, boot disk | SPARCcenter 2000 or SPARCserver 1000 System Service Manual | 801-2007 | 801-2895 |
|
|