SPARCcluster HA Server Service Manual
この本のみを検索
PDF 文書ファイルをダウンロードする

SPARCstorage Array Firmware and Device Driver Error Messages

D

This appendix gives error messages specific to the SPARCstorage Array.

D.1 Message Formats

Error indications from the SPARCstorage array drivers (pln and soc) are always sent to syslog (/var/adm/messages). Additionally, depending on the type of event that generated the message, it may be sent to the console. These messages are limited to significant events like cable disconnections. Messages sent to the console are in the form:

  [WARNING:]  instance:  <message>  

The syslog messages may contain additional text. This message ID identifies the message, its producer, and its severity.:

  ID[SUNWssa.soc.messageid.####] instance: <message>  

Some examples:

  soc3: Transport error:  Fibre Channel Online Timeout  
  ID[SUNWssa.soc.link.6010] soc1: port: 0 Fibre Channel is ONLINE  

In the following discussion, messages will be presented with the message ID and the message text, even though the message ID is not displayed on the console. The character # implies a numeric quantity and ... implies a string of characters or numbers. The prefix ID[SUNWssa] is implied, and not shown.

  soc.link.6010     soc#: port: # Fibre Channel is ONLINE  

Note that most disk drive and media-related errors will result in messages from the ssd drivers. See the manual pages for sd(7), pln (7), and soc(7) for information on these messages.

D.2 System Configuration Errors

This class of errors may occur because of insufficient system resources (for example, not enough memory to complete installation of the driver), or because of hardware restrictions of the machine into which the SPARCstorage array host adapter is installed.
This class of errors may also occur when your host system encounters a hardware error on the host's system board, such as a failed SIMM.

D.2.1 soc Driver


  soc.attach.4004 soc#: attach failed: bad soft state  
  soc.attach.4010 soc#: attach failed: unable to map eeprom  
  soc.attach.4020 soc#: attach failed: unable to map XRAM  
  soc.attach.4030 soc#: attach failed: unable to map registers  
  soc.attach.4040 soc#: attach failed: unable to access status register  
  soc.attach.4050 soc#: attach failed: unable to access hostadapter XRAM  
  soc.attach.4060 soc#: attach failed: unable to install interrupt handl  
  soc.attach.4003 soc#: attach failed: alloc soft state  
  soc.attach.4070 soc#: attach failed: offline packet structure allocat  

These messages indicate that the initialization of the soc driver was unable to complete due to insufficient system virtual address mapping resources or kernel memory space for some of its internal structures. The host adapter(s) associated with these messages will not be functional.

  soc.driver.4020   soc#: alloc of request queue failed  
  soc.driver.4040   soc#: DVMA request queue alloc failed  
  soc.driver.4050   soc#: alloc of response queue failed  
  soc.driver.4060   soc#: DVMA response queue alloc failed  
  soc.driver.4070   soc#: alloc failed  
  soc.driver.4090   soc#: alloc failed  
  soc.driver.4100   soc#: DMA address setup failed  
  soc.driver.4110   soc#: DVMA alloc failed  

These messages indicate there are not enough system DVMA or kernel heap resources available to complete driver initialization. The associated host adapter(s) will be inoperable if any of these conditions occurs.

  soc.attach.4001 soc#: attach failed: device in slave-only slot  
  soc.attach.4002 soc#: attach failed: hilevel interrupt unsupported  
  soc.driver.4001 soc#: Not self-identifying  

The SBus slot into which the host adapter is installed cannot support the features required to operate the SPARCstorage array. The host adapter should be relocated to a different SBus slot. If you see this error message, it's possible that you are running an unsupported configuration (for example, you may have the SPARCstorage array connected to a server that is not supported).

D.2.1.1 pln Driver


  pln_ctlr_attach: controller struct alloc failed  
  pln_ctlr_attach: scsi_device alloc failed  
  pln_ctlr_attach: pln_address alloc failed  
  pln_ctlr_attach: controller struct alloc failed  
  pln_ctlr_attach: scsi_device alloc failed  
  pln_ctlr_attach: pln_address alloc failed  

The pln driver was unable to obtain enough kernel memory space for some of its internal structures if one of these messages is displayed. The SPARCstorage array(s) associated with these messages will not be functional.

  pln_init: mod_install failed error=%d  

Module installation of the pln driver failed. None of the SPARCstorage arrays connected to the machine will be operable.

D.3 Hardware Errors

Errors under this classification are generally due to hardware failures (transient or permanent), or improper configuration of some subsystem components.

D.3.0.1 soc driver


  soc.wwn.3010      soc#: No SSA World Wide Name, using defaults  

The associated SPARCstorage array has an invalid World Wide Name (WWN). A default World Wide Name is being assumed by the software. The system will still function with a default World Wide Name if only one SSA gives this message (they all would be using the same default WWN). A valid World Wide Name should be programmed into the SPARCstorage array (refer to the ssacli.1m man pages for more information).

  soc.wwn.3020     soc#: Could not get port world wide name  

If there is a failure on the SPARCstorage array and the driver software is unable to obtain the devices WWN, this message is displayed.

  soc.wwn.5020     soc#: INCORRECT WWN: Found: ... Expected: ...  

This message is usually the result of plugging the wrong fibre channel cable into a host adapter. It indicates that the World Wide Name of the device connected to the host adapter does not match the World Wide Name of the device connected when the system was booted.

  soc.driver.3010   soc#: host adapter fw date code: <not available>  

This may appear if no date code is present in the host adapter microcode. This situation should not occur under normal circumstances and possibly indicates the use of invalid SPARCstorage array drivers or a failed host adaptor.
For reference, the expected message is:

  soc.driver.1010   soc#: host adapter fw date code: ...  

This is printed at boot time to indicate the revision of the microcode loaded into the host adapter.

  soc.link.4060     soc#: invalid FC packet; ...  

The soc driver has detected some invalid fields in a packet received from the host adapter. The cause of this is most likely incorrectly functioning hardware (either the host adapter itself or some other SBus hardware).

  soc.link.4020     soc#: Unsupported Link Service command: ...  
  soc.link.4030     soc#: Unknown FC-4 command: ...  
  soc.link.4040     soc#: unsupported FC frame R_CTL: ...  
  soc.link.4010     soc#: incomplete continuation entry  
  soc.link.3010     soc#: unknown LS_Command  

D.3.0.2 pln Driver


  Transport error:  Received P_RJT status, but no header  
  Transport error:  Fibre Channel P_RJT  
  Transport error:  Fibre Channel P_BSY  

These messages indicate the presence of invalid fields in the fibre channel frames received by the host adapter. This may indicate a fibre channel device other than Sun's fibre channel device for the SPARCstorage array. The messages may also be caused by a failed host adaptor, Fibre Channel Optical Module, fibre optic cable, or array controller.

  soc.link.4080 soc#: Connections via Fibre Channel Fabric are unsupported  

The current SPARCstorage array software does not support fibre channel fabric (switch) operation. This message indicates that the software has detected the presence of a fabric.

  soc.login.5010    soc#: Fibre Channel login failed  
  soc.login.5020    soc#: fabric login failed  
  soc.login.5030    soc#: N-PORT login not successful  
  soc.login.5040    soc#: N-PORT login failure  

These messages may occur if part of the fibre channel link initialization or login procedures fail. Retries of the login procedure will be performed.

  soc.login.6010    soc#: Fibre Channel login succeeded  

The soc driver will display this message following a successful fibre channel login procedure (part of link initialization) if the link had previously gone from an operable to an inoperable state. The "login succeeded" message indicates the link has again become fully functional.

  soc.login.4020    soc#: login retry count exceeded for port: #  
  soc.login.4040    soc#: login retry count exceeded  

These errors indicate that the login retry procedure is not working and the port/card associated with the message is terminating the login attempt. The associated SPARCstorage array will be inaccessible by the system.
Note that the fibre channel specification requires each device to attempt a login to a fibre channel fabric, even though one may not be present. A failure of the fabric login procedure due to link errors (even in a point-to-point topology) may result in the printing of fabric login failure messages even with no fabric present.

  Link errors detected  

A number of retryable errors may have occurred on the fibre channel link. This message may be displayed if the number of link errors exceeds the allowable link bit error rate (1 bit/1012 bits). If you see this message, clean the fibre optic cable according to the instructions given in the SPARCstorage Array Service Manual. If the problem still exists, replace either the fibre optic cable or the Fibre Channel Optical Module.

D.3.0.3 pln Driver


  Transport error:  FCP_RSP_CMD_INCOMPLETE  
  Transport error:  FCP_RSP_CMD_DMA_ERR  
  Transport error:  FCP_RSP_CMD_TRAN_ERR  
  Transport error:  FCP_RSP_CMD_RESET  
  Transport error:  FCP_RSP_CMD_ABORTED  

An error internal to the SPARCstorage array controller has occurred during an I/O operation. This may be due to a hardware failure in a SCSI interface of the SPARCstorage array controller, a failure of the associated SCSI bus (drive tray) in the SPARCstorage array package, or a faulty disk drive.

  Transport error:  FCP_RSP_CMD_TIMEOUT  

The SCSI interface logic on the SPARCstorage array controller board has timed out on a command issued to a disk drive. This may be caused by a faulty drive, drive tray, or array controller.

  Transport error:  FCP_RSP_CMD_OVERRUN  

This error (on an individual I/O operation) may indicate either a hardware failure of a disk drive in the SPARCstorage array, a failure of the associated drive tray, or a fault in the SCSI interface on the SPARCstorage array controller. The system will try to access the failed hardware again after you see this message.

  Transport error:  FCP_RSP_SCSI_PORT_ERR  

The firmware on the SPARCstorage array controller has detected the failure of the associated SCSI interface chip. Any I/O operations to drives connected to this particular SCSI bus will fail. If you see this message, you may have to replace the array controller.

  Transport error:  Fibre Channel Offline  
  soc.link.6010     soc#: port: # Fibre Channel is ONLINE  

If you see these messages together, the system was able to recover from the error, so no action is necessary.

  Transport error:  Fibre Channel Offline  
  Transport error:  Fibre Channel Online Timeout  

If you see these messages together, an I/O operation to a SPARCstorage array drive has failed because the fibre channel link has become inoperable. The driver will detect the transition of the link to an inoperable state and will then initiate a timeout period. Within the timeout period, if the link should become usable again, any waiting I/O operations will be resumed. However, if the timeout should expire before the link becomes operational, any I/O operations will fail.
The latter message (timeout) means that the host adapter microcode has detected a timeout on a particular I/O operation. This message will be printed (and the associated I/O operation will fail) only if the retry count of the driver for this class of link errors has been exhausted.

  Transport error:  CMD_DATA_OVR  
  Transport error:  Unknown CQ type  
  Transport error:  Bad SEG CNT  
  Transport error:  Fibre Channel Invalid X_ID  
  Transport error:  Fibre Channel Exchange Busy  
  Transport error:  Insufficient CQEs  
  Transport error:  ALLOC FAIL  
  Transport error:  Fibre Channel Invalid S_ID  
  Transport error:  Fibre Channel Seq Init Error  
  Transport error:  Unknown FC Status  

These errors indicate the driver or host adapter microcode has detected a condition from which it cannot recover. The associated I/O operation will fail. This message should be followed or preceded by other error messages; refer to these other error messages to determine what action you should take to fix the problem.

  Timeout recovery failed, resetting  

This message may be displayed by the pln driver if the normal I/O timeout error recovery procedures were unsuccessful. In this case, the software will perform a hardware reset of the host adapter and attempt to continue system operation.

  reset recovery failed  

This message will be printed only if the hardware reset error recovery has failed, following the failure of normal fibre channel link error recovery. The associated SPARCstorage array(s) will be inaccessible by the system. This situation should only occur due to failed host adapter hardware.

D.4 Informational Messages:

Messages in this category will be used to convey some information about the configuration or state of various SPARCstorage array subsystem components.

D.4.0.1 soc Driver


  soc.driver.1010   soc#: host adapter fw date code: ...  

This string will be printed at boot time to indicate the revision of the microcode loaded into the host adapter.

  soc.link.6010     soc#: port: # Fibre Channel is ONLINE  
  soc.link.5010     soc#: port: # Fibre Channel is OFFLINE  

Under a variety of circumstances, the fibre channel link may appear to the host adapter to have entered an inoperable state. Frequently, such a condition is temporary.
The following are possible causes for the fibre channel link to appear to go "offline":
  • A temporary burst of errors on the fibre cable. In this case, the "OFFLINE" message should be followed by an "ONLINE" message shortly afterwards.
  • Unplugging of the fibre channel cable from either the host adapter or the SPARCstorage array
  • Powering off a connected SPARCstorage array
  • Failure of an Fibre Channel Optical Module in either the host adapter or the SPARCstorage array
  • Failure of an optical cable
  • Failure of a SPARCstorage array controller
  • Failure of a host adapter card
Note that any pending I/O operations to the SPARCstorage array will be held by the driver for a period of time (one to two minutes) following a link "offline" in case the link should return to an operable state, so that pending operations can be completed. However, if a sufficient time elapses following the transition of the link to "offline" without a corresponding "online" transition, the driver will fail the I/O operations associated with the formerly connected SPARCstorage array.
It is normal to see the ONLINE message for each connected SPARCstorage array when the system is booting.

  soc.link.1010  soc#: message: ...  

Peripheral devices on the Fibre Channel (like the SPARCstorage array) can cause messages to be printed on the system console/syslog under certain circumstances.
Under normal operation at boot time, the SPARCstorage array will display the revision date of its firmware following a fibre channel login. This message will be of the form:

  soc.link.1010 soc#: message:SSA EEprom date: Fri May 27 12:35:46 1994  

Other messages from the controller may indicate the presence of warning or failure conditions detected by the controller firmware.

D.5 Internal Software Errors

These messages may be printed by the driver in a situation where it has detected some inconsistency in the state of the machine. These may sometimes be the result of failed hardware, usually either the SPARCstorage array host adapter or SBus hardware.
These are not expected to occur under normal operation.

D.5.0.1 soc Driver


  soc.driver.4010   soc#: Illegal state: SOC_COMPLETE == 0  
  soc.driver.4030   soc#: too many continuation entries  
  soc.driver.4080   soc#: no unsolicited commands to get  
  soc.link.3020     soc#: unknown status: ...  
  soc.link.4050     soc#: unsolicited: Illegal state: flags: ...  
  soc.link.4070     soc#: invalid fc_ioclass  
  soc.login.1010    soc#: reset with resets disabled  

D.5.0.2 pln Driver


  ddi_dma_sync failed (rsp)  
  Invalid transport status  
  Unknown state change  
  Grouped disks not supported  
  pln_scsi_pktfree: freeing free packet