Sun Enterprise 6x00, 5x00, 4x00, and 3x00 Systems Dynamic Reconfiguration User's Guide
  Искать только в названиях книг
Просмотреть эту книгу в:
Загрузить это руководство в формате PDF (109 КБ)

Chapter 4 Troubleshooting

Diagnostic Messages

The following table lists examples of cfgadm diagnostic messages. (Syntax messages are not included in this list.)


cfgadm: Configuration administration not supported on this machine
cfgadm: hardware component is busy, try again
cfgadm: operation: configuration operation not supported on this machine
cfgadm: operation: Data error: error_text
cfgadm: operation: Hardware specific failure: error_text
cfgadm: operation: Insufficient privileges
cfgadm: operation: Operation requires a service interruption
cfgadm: System is busy, try again

See config_admin(3X) for additional error message detail.

Troubleshooting Specific Failures

There are several common types of failure:

Driver Does Not Support DR

  1. Some drivers do not yet support DR operations. A DR-compatible driver must be suspendable. Use this command to test for suspendable drivers.


    # cfgadm -x quiesce-test sysctrl#:slot#
    

  2. DR may not yet support some types of I/O and CPU/memory boards in Enterprise 3x00, 4x00, 5x00, and 6x00 systems. Use the quiesce test (above) or refer to the latest release notes.

Unable to Unconfigure

Before you attempt a DR unconfigure operation:

  • Devices must not be in use by the operating system

  • Drivers must be detachable or suspendable

A device cannot be unconfigured or disconnected while it is in use. Disks attached to an I/O board must unmounted before any attempt is made to unconfigure or disconnect that board. Any attempt to unconfigure/disconnect a board whose devices are still in use will generate an error.

If an unconfiguration operation fails because an I/O board has a busy or open device, the board is left only partially unconfigured. The operation sequence stopped at the busy device.

To regain access to the devices which were not unconfigured, the board must be completely unconfigured and then reconfigured.

In such a case, the system will log messages similar to the following:


NOTICE: unconfiguring dual-pci board in slot 7
NOTICE: dual-pci board in slot 7 partially unconfigured 
reason:sysc iohelp unconfigure: Device busy
output from sysctrl unconfigure is:detach failed: /pci@f,4000/SUNW,isptwo@3/sd@2,0
is busy

To continue the unconfigure operation, unmount the device and retry the unconfigure operation. The board must be in the unconfigured state before you try to configure this board.

Unable to Configure

A configure operation may fail because an I/O board with a device does not currently support hot-plugging. In such a situation, the board is now only partially configured. The operation has stopped at the unsupported hot-plug device. In this situation, the board must be brought back to the unconfigured state before another configure attempt. In such a case the system will log messages similar to the following:


NOTICE: configuring dual-sbus-soc+ board in slot 4
NOTICE: dual-sbus-soc+ board in slot 4 partially configured
reason:sysc iohelp configure: Bad address
output from sysctrl configure is:attach failed: /sbus@8,0/SUNW,foo@d,10000/bar

To continue the configure operation, either remove the unsupported device's driver or replace it with a new version of the driver that will support hot-plugging.

Problems with Network Devices

DR does not automatically terminate use of all network interfaces on the board that is being disconnected. You must manually terminate the use of each interface.

DR does not allow an unconfigure operation on any interface that fits any of the following conditions. In these cases, the unconfigure operation fails and DR displays an error message. The operation fails if:

  • The interface is the primary network interface for the machine; that is, the interface whose IP address corresponds to the network interface name contained in the file /etc/nodename. Note that bringing down the primary network interface for the machine prevents network information name services from operating, which results in the inability to make network connections to remote hosts using applications such as ftp(1), rsh(1), rcp(1), and rlogin(1). NFS client and server operations are also affected.

  • The interface is the active alternate for an Alternate Pathing (AP) meta device when the AP meta device is plumbed. Interfaces used by the AP system should not be the active path when the board is being unconfigured. Manually switch the active path to one that is not on the board being unconfigured. If no such path exists, manually execute the ifconfig down and ifconfig unplumb commands on the AP interface. (To manually switch an active path, use the apconfig(1M) command.)

Problems with I/O Devices

All I/O devices must be closed before they are unconfigured. To see which processes have these devices open, use the fuser(1M) command.

Perform the following tasks for I/O devices.

  • If the redundancy features of Alternate Pathing or Solstice DiskSuite mirroring are used to access a device connected to the board, reconfigure these subsystems so that the device or network is accessible by way of controllers on other system boards.

  • Unmount file systems, including Solstice DiskSuite meta-devices that have a board resident partition. (Example: umount/partit)

  • Remove Solstice DiskSuite or Alternate Pathing databases from board-resident partitions. The location of Solstice DiskSuite or Alternate Pathing databases is explicitly chosen by the user and can be changed.

  • Remove any private regions used by Sun Volume Manager or Veritas Volume Manager. Volume Manager by default uses a private region on each device that it controls, so such devices must be removed from Volume Manager control before they can be detached.

  • Any RSM 2000 controllers on the board that is being detached should be taken offline, using the rm6 or rdacutil commands.

  • Remove disk partitions from the swap configuration.

  • Either kill any process that directly opens a device or raw partition, or direct it to close the open device on the board.

  • If a detach-unsafe device is present on the board, close all instances of the device and use modunload(1M) to unload the driver.If a detach-unsafe device is present on the board, close all instances of the device and use modunload(1M) to unload the driver.


Caution - Caution -

Unmounting file systems may affect NFS client systems.


RPC Time-out or Loss of Connection

RPC time-outs occur by default after two minutes. Administrators may need to increase this time-out value to avoid time-outs during a DR-induced operating system quiescence, which may take longer than two minutes. These changes affect both the client and server machines.