Contained WithinFind More DocumentationFeatured Support Resources | Download this book in PDF (521 KB)
Chapter 3 System-Specific IssuesThis chapter describes issues specific to Sun midrange and high-end servers. Current Sun servers are part of the Sun Fire system family. Older servers are part of the Sun Enterprise system family. Note – The Sun Validation Test Suite release notes are now a separate document and can be found at http://sun.com. Note – Some of the issues and bugs in this chapter have been fixed in subsequent Solaris 10 releases. If you have upgraded your Solaris software, certain issues and bugs in this chapter might no longer apply. To see which bugs and issues no longer apply to your specific Solaris 10 software, refer to Appendix A, Table of Integrated Bug Fixes in the Solaris 10 Operating System. Dynamic Reconfiguration on Sun Fire High-End SystemsThis section describes major domain-side DR bugs on the following Sun Fire high-end systems that run the Solaris 10 software:
For information about DR bugs on Sun Management Services, see the SMS Release Notes for the SMS version that is running on your system. Note – This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers. Known Software and Hardware BugsThe following software and hardware bugs apply to Sun Fire high-end systems. Network Device Removal Fails When a Program Is Holding the Device Open (5054195)If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing. Workaround: As superuser, perform the following steps:
Deleteboard Shows Leakage Error (4730142)Warnings might be displayed when a DR command is executing on a system that is configured with the SunSwift PCI card, Option 1032. These warnings appear on domains that are running either the Solaris 8, Solaris 9, or Solaris 10 software. The following warning is an example:
These warnings are benign. The Direct Virtual Memory Access (DVMA) space is properly refreshed during the DR operation. No true kernel memory leak occurs. Workaround: To prevent the warning from being displayed, add the following line to /etc/system:
GigaSwift Ethernet MMF Link Fails With CISCO 4003 Switch After DR AttachThe link fails between a system with a Sun GigaSwift Ethernet MMF Option X1151A and certain CISCO switches. The failure occurs when you attempt to run a DR operation on such a system that is attached to one of the following switches:
This problem is not seen on a CISCO 6509 switch. Workaround: Use another switch. Alternatively, you can consult Cisco for a patch for the listed switches. Dynamic Reconfiguration on Sun Fire Midrange SystemsThis section describes major issues that are related to DR on the following Sun Fire midrange systems:
Note – This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers. Minimum System Controller FirmwareTable 3–1 shows acceptable combinations of Solaris software and System Controller (SC) firmware for each Sun Fire midrange system to run DR. Note – To best utilize the latest firmware features and bug fixes, run the most recent SC firmware on your Sun Fire midrange system. For the latest patch information, see http://sunsolve.sun.com. Table 3–1 Minimum SC Firmware for Each Platform and Solaris Release
You can upgrade the system firmware for your Sun Fire midrange system by connecting to an FTP or HTTP server where the firmware images are stored. For more information, refer to the README and Install.info files. These files are included in the firmware releases that are running on your domains. You can download Sun patches from http://sunsolve.sun.com. Known DR Software BugsThis section lists important DR bugs. Network Device Removal Fails When a Program Is Holding the Device Open (5054195)If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing. Workaround: As superuser, perform the following steps:
Cannot Unconfigure cPCI Board With a Disabled Port 0 (4798990)On Sun Fire midrange systems, a CompactPCI (cPCI) I/O board cannot be unconfigured when Port 0 (P0) on that board is disabled. This problem exists in Solaris 10 and Solaris 9 software. It also exists in Solaris 8 software that has one or more of the following patches installed:
The error also occurs only during DR operations that involve cPCI boards. An error message similar to the following example is displayed:
NO.IB7 is a CompactPCI I/O Board with P0 disabled. Workaround: Disable the slots instead of Port 0. Sun Enterprise 10000 Release NotesThis section describes issues that involve the following features on the Sun Enterprise 10000 server:
Note – The Solaris 10 software can be run on individual domains within a Sun Enterprise 10000 system. However, the Sun Enterprise 10000 System Service Processor is not supported by this release. System Service Processor RequirementThe SSP 3.5 software is required on your System Service Processor (SSP) to support the Solaris 10 software. Install the SSP 3.5 on your SSP first. Then you can install or upgrade to the Solaris 10 OS on a Sun Enterprise 10000 domain. The SSP 3.5 software is also required so that the domain can be properly configured for DR Model 3.0. Dynamic Reconfiguration IssuesThis section describes different issues that involve dynamic reconfiguration on Sun Enterprise 10000 domains. DR Model 3.0You must use DR 3.0 on Sun Enterprise 10000 domains that run the Solaris OS beginning with the Solaris 9 12/03 release. DR model 3.0 refers to the functionality that uses the following commands on the SSP to perform domain DR operations:
You can run the cfgadm command on domains to obtain board status information. DR model 3.0 also interfaces with the Reconfiguration Coordination Manager (RCM) to coordinate the DR operations with other applications that are running on a domain. For details about DR model 3.0, refer to the Sun Enterprise 10000 Dynamic Reconfiguration User Guide. DR and Bound User ProcessesFor this Solaris release, DR no longer automatically unbinds user processes from CPUs that are being detached. You must perform this operation before initiating a detach sequence. The drain operation fails if CPUs are found with bound processes. Network Device Removal Fails When a Program Is Holding the Device Open (5054195)If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing. Workaround: As superuser, perform the following steps:
Enabling DR 3.0 Requires an Extra Step in Certain Situations (4507010)The SSP 3.5 software is required for a domain to be properly configured for DR 3.0. After upgrading your SSP to SSP 3.5, when DR 3.0 is enabled on the domain, run the following command:
InterDomain NetworksFor a domain to become part of an InterDomain Network, all boards with active memory in that domain must have at least one active CPU. OpenBoot PROM VariablesBefore you issue the boot net command from the OpenBoot PROM prompt (OK), verify that the local-mac-address? variable is set to false. This setting is the factory default setting. If the variable is set to true, you must ensure that this value is an appropriate local configuration. A local-mac-address? that is set to true might prevent the domain from successfully booting over the network. In a netcon window, you can use the following command at the OpenBoot PROM prompt to display the values of the OpenBoot PROM variables:
To reset the local-mac-address? variable to the default setting. use the setenv command:
Dynamic Reconfiguration on Sun Enterprise Midrange SystemsThis section contains the latest information about dynamic reconfiguration (DR) functionality for the following midrange servers that are running the Solaris 10 software:
For more information about Sun Enterprise Server Dynamic Reconfiguration, refer to the Dynamic Reconfiguration User's Guide for Sun Enterprise 3x00/4x00/5x00/6x00 Systems. The Solaris 10 release includes support for all CPU/memory boards and most I/O boards in the systems that are mentioned in the preceding list. Supported HardwareBefore proceeding, make sure that the system supports dynamic reconfiguration. If your system is of an older design, the following message appears on your console or in your console logs. Such a system is not suitable for dynamic reconfiguration.
The following I/O boards are not currently supported:
Software NotesThis section provides general software information about DR. Enabling Dynamic ReconfigurationTo enable dynamic reconfiguration, you must set two variables in the /etc/system file. You must also set an additional variable to enable the removal of CPU/memory boards. Perform the following steps:
Quiesce TestYou start the quiesce test with the following command:
On a large system, the quiesce test might run for up to a minute. During this time no messages are displayed if cfgadm does not find incompatible drivers. Disabled Board ListAttempting to connect a board that is on the disabled board list might produce an error message:
To override the disabled condition, two options are available:
To remove all boards from the disabled board list, choose one of two options depending on the prompt from which you issue the command:
For further information about the disabled-board-list setting, refer to the “Specific NVRAM Variables” section in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems manual. This manual is part of the documentation set in this release. Disabled Memory ListInformation about the OpenBoot PROM disabled-memory-list setting is published in this release. See “Specific NVRAM Variables” in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems in the Solaris on Sun Hardware documentation. Unloading Detach-Unsafe DriversIf you need to unload detach-unsafe drivers, use the modinfo line command to find the module IDs of the drivers. You can then use the module IDs in the modunload command to unload detach-unsafe drivers. Self-Test Failure During a Connect SequenceRemove the board from the system as soon as possible if the following error message is displayed during a DR connect sequence:
The board has failed self-test, and removing the board avoids possible reconfiguration errors that can occur during the next reboot. The failed self-test status does not allow further operations. Therefore, if you want to retry the failed operation immediately, you must first remove and then reinsert the board. Known BugsThe following list is subject to change at any time. Network Device Removal Fails When a Program Is Holding the Device Open (5054195)If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing. Workaround: As superuser, perform the following steps:
Memory Interleaving Set Incorrectly After a Fatal Reset (4156075)Memory interleaving is left in an incorrect state when a Sun Enterprise5 x500 server is rebooted after a fatal reset. Subsequent DR operations fail. The problem only occurs on systems with memory interleaving set to min. Workaround: Choose one of the following options:
Cannot Unconfigure a CPU/Memory Board That Has Interleaved Memory (4210234)To unconfigure and subsequently disconnect a CPU board with memory or a memory-only board, first unconfigure the memory. However, if the memory on the board is interleaved with memory on other boards, the memory cannot currently be unconfigured dynamically. Memory interleaving can be displayed by using the prtdiag or the cfgadm commands. Workaround: Shut down the system before servicing the board, then reboot afterward. To permit future DR operations on the CPU/memory board, set the NVRAM memory-interleave property to min. See also Memory Interleaving Set Incorrectly After a Fatal Reset (4156075) for a related discussion about interleaved memory. Cannot Unconfigure a CPU/Memory Board That Has Permanent Memory (4210280)To unconfigure and subsequently disconnect a CPU board with memory or a memory-only board, first unconfigure the memory. However, some memory cannot currently be relocated. This memory is considered permanent. Permanent memory on a board is marked “permanent” in the cfgadm status display:
In this example, the board in slot3 has permanent memory and so cannot be removed. Workaround: Shut down the system before servicing the board, then reboot afterward. cfgadm Disconnect Fails When Running Concurrent cfgadm Commands (4220105)If a cfgadm process is running on one board, an attempt to simultaneously disconnect a second board fails. The following error message is displayed:
Workaround: Run only one cfgadm operation at a time. Allow a cfgadm operation that is running on one board to finish before you start a cfgadm disconnect operation on a second board. |
|||||||||||||||||||||||||||||||||||||||||||||||||