Writing Device Drivers
只搜尋這本書
以 PDF 格式下載這本書

Advanced Topics

B

This appendix contains a collection of topics. Not all drivers need to be concerned with the issues addressed.

Multithreading

This section supplements the guidelines presented in Chapter 4, "Multithreading," for writing an MT-safe driver, a driver that safely supports multiple threads.

Lock Granularity

Here are some issues to consider when deciding on how many locks to use in a driver:
  • The driver should allow as many threads as possible into the driver: this leads to fine-grained locking.
  • However, it should not spend too much time executing the locking primitives: this approach leads to coarse-grained locking.
  • Moreover, the code should be simple and maintainable.
  • Avoid lock contention for shared data.
  • Write reentrant code wherever possible. This makes it possible for many threads to execute without grabbing any locks.
  • Use locks to protect the data and not the code path.
  • Keep in mind the level of concurrency provided by the device: if the controller can only handle one request at a time, there is no point in spending a lot of time making the driver handle multiple threads.
A little thought in reorganizing the ordering and types of locks around such data can lead to considerable savings.

Avoiding Unnecessary Locks

  • Use the MT semantics of the entry points to your advantage. If an element of a device's state structure is read-mostly--for example, initialized in attach( ), and destroyed in detach( ), but only read in other entry points--there is no need to acquire a mutex to read that element of the structure. This may sound obvious, but blindly adding calls to mutex_enter(9F) and mutex_exit(9F) around every access to such a variable can lead to unnecessary locking overhead.
  • Make all entry points reentrant and reduce the amount of shared data, by changing static variables to automatic, or by adding them to your state structure.

Note - Kernel-thread stacks are small (currently 8 Kbytes), so do not allocate large automatic variables and avoid deep recursion.

Locking Order

When acquiring multiple mutexes, be sure to acquire them in the same order on each code path. For example, mutexes A and B are used to protect two resources in the following ways:
Code Path 1....Code Path 2
mutex_enter(&A);      mutex_enter(&B);
    ...                   ...
mutex_enter(&B);      mutex_enter(&A);
    ...                   ...
mutex_exit(&B);       mutex_exit(&A);
    ...                   ...
mutex_exit(&A);       mutex_exit(&B);

If thread 1 is executing code path one, and thread two is executing code path 2, the following could occur:
  1. Thread one acquires mutex A.

  2. Thread two acquires mutex B.

  3. Thread one needs mutex B, so it blocks holding mutex A.

  4. Thread two needs mutex A, so it blocks holding mutex B.

These threads are now deadlocked. This is hard to track down, and usually even more so since the code paths are rarely so straightforward. Also, it doesn't always happen, as it depends on the relative timing of threads one and two.

Scope of a Lock

Experience has shown that it is easier to deal with locks that are either held throughout the execution of a routine, or locks that are both acquired and released in one routine. Avoid nesting like this:
        static void
        xxfoo(...)
        {
             mutex_enter(&softc->lock);
             ...
             xxbar();
        }
        static void
        xxbar(...)
        {
             ...
             mutex_exit(&softc->lock);
        }

This example works, but will almost certainly lead to maintenance problems.
If contention is likely in a particular code path, try to hold locks for a short time. In particular, arrange to drop locks before calling kernel routines that might block. For example:
mutex_enter(&softc->lock);
...
softc->foo = bar;

             softc->thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
             ...
             mutex_exit(&softc->lock);

This is better coded as:
thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
mutex_enter(&softc->lock);
...
softc->foo = bar;
softc->thingp = thingp;
...
mutex_exit(&softc->lock);

Potential Panics

Here is a set of mutex-related panics:
panic: recursive mutex_enter. mutex %x caller %x

  Mutexes are not reentrant by the same thread. If you already own the
  mutex, you cannot own it again. Doing this leads to the above panic.

panic: mutex_adaptive_exit: mutex not held by thread

Releasing a mutex that the current thread does not hold causes the above panic.
panic: lock_set: lock held and only one CPU

This only occurs on a uniprocessor, and says that a spin mutex is held and it would spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.
A common cause of this panic is that the device's interrupt is high-level (see ddi_intr_hilevel(9F) and Intro(9F)), and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but may not be so if it's blocking while grabbing an adaptive mutex with mutex_enter(9F).

Note - In principle, this is only a problem for drivers that operate above lock level.

Sun Disk Device Drivers

Sun disk devices represent an important class of block device drivers. A Sun disk device is one that is supported by disk utility commands such as format(1M) and newfs(1M).

Disk I/O Controls

Sun disk drivers need to support a minimum set of I/O controls specific to Sun disk drivers. These I/O controls are specified in the dkio(7) manual page. Disk I/O controls transfer disk information to or from the device driver. In the case where data is copied out of the driver to the user, ddi_copyout(9F) should be used to copy the information into the user's address space. When data is copied to the disk from the user, the ddi_copyin(9F) should be used to copy data into the kernels address space. Table B-1 lists the mandatory Sun disk I/O controls.
Table B-1
I/O ControlDescription
DKIOCINFOReturn information describing the disk controller.
DKIOCGAPARTReturn a disk's partition map.
DKIOCSAPARTSet a disk's partition map.
DKIOCGGEOMReturn a disk's geometry.
DKIOCSGEOMSet a disk's geometry.
DKIOCGVTOCReturn a disk's Volume Table of Contents.
DKIOCSVTOCSet a disk's Volume Table of Contents.
Sun disks may also support a number of optional ioctls listed in the hdio(7) manual page. Table B-2 lists optional Sun disk ioctls:
Table B-2
I/O ControlDescription
HDKIOCGTYPEReturn the disk's type.
HDKIOCSTYPESet the disk's type.
Table B-2
I/O ControlDescription
HDKIOCGBADReturn the bad sector map of the device.
HDKIOCSBADSet the bad sector map for the device.
HDKIOCGDIAGReturn the diagnostic information regarding the most recent command.

Disk Performance

The Solaris 2.x DDI/DKI provides facilities to optimize I/O transfers for improved file system performance. It supports a mechanism to manage the list of I/O requests so as to optimize disk access for a file system. See "Asynchronous Data Transfers" on page 184 for a description of enqueuing an I/O request.
The diskhd structure is used to manage a linked list of I/O requests.
struct diskhd {
   long                     b_flags;     /* not used, needed for */
                                         /* consistency          */
   struct buf *b_forw,      *b_back;     /* queue of unit queues */
   struct buf *av_forw, *av_back;        /* queue of bufs for this unit */
   long                     b_bcount;    /* active flag */
};

The diskhd data structure has two buf pointers which can be manipulated by the driver. The av_forw pointer points to the first active I/O request. The second pointer, av_back points to the last active request on the list.
A pointer to this structure is passed as an argument to disksort(9F) along with a pointer to the current buf structure being processed. The disksort(9F) routine is used to sort the buf requests in a fashion that optimizes disk seek and then inserts the buf pointer into the diskhd list. The disksort program uses the value that is in b_resid of the buf structure as a sort key. It is up to the driver to set this value. Most Sun disk drivers use the cylinder group as the sort key. This tends to optimize the file system read-ahead accesses.
Once data has been added to the diskhd list, the device needs to transfer the data. If the device is not busy processing a request, the xxstart( ) routine pulls the first buf structure off the diskhd list and starts a transfer.
If the device is busy, the driver should return from the xxstrategy( ) entry point. Once the hardware is done with the data transfer, it generates an interrupt. The driver's interrupt routine is then called to service the device. After servicing the interrupt, the driver can then call the start( ) routine to process the next buf structure in the diskhd list.

SCSA

Global Data Definitions

The following is information for debugging, useful when a driver runs into bus-wide problems. There is one global data variable that has been defined for the SCSA implementation: scsi_options. This variable is a SCSA configuration longword used for debug and control. The defined bits in the scsi_options longword can be found in the file <sys/scsi/conf/autoconf.h>, and have the following meanings when set:
Table B-3
OptionDescription
SCSI_OPTIONS_DRenable global disconnect/reconnect
SCSI_OPTIONS_SYNCenable global synchronous transfer capability
SCSI_OPTIONS_PARITYenable global parity support
SCSI_OPTIONS_TAGenable global tagged queuing support
SCSI_OPTIONS_FASTenable global FAST SCSI support: 10MB/sec transfers, as opposed to 5 MB/sec
SCSI_OPTIONS_WIDEenable global WIDE SCSI

Note - The setting of scsi_options affects all host adapter and target drivers present on the system(as opposed to scsi_ifsetcap(9F)). Refer to scsi_hba_attach(9F) in the Solaris 2.4 Reference Manual AnswerBook for information on controlling these options for a particular host adapter.

The default setting for scsi_options has these values set:
  • SCSI_OPTIONS_DR
  • SCSI_OPTIONS_SYNC
  • SCSI_OPTIONS_PARITY
  • SCSI_OPTIONS_TAG
  • SCSI_OPTIONS_FAST
  • SCSI_OPTIONS_WIDE

Tagged Queueing

For a definition of tagged queueing refer to the SCSI-2 specification. To support tagged queueing, first check the scsi_options flag SCSI_OPTIONS_TAG to see if tagged queueing is enabled globally. Next, check to see if the target is a SCSI-2 device and whether it has tagged queueing enabled. If this is all true, attempt to enable tagged queueing by using scsi_ifsetcap(9F). Code Example B-1 shows an example of supporting tagged queueing.
Code Example B-1 Supporting SCSI Tagged Queueing
#define ROUTE &sdp->sd_address
    ...
    /*
     * If SCSI-2 tagged queueing is supported by the disk drive and
     * by the host adapter then we will enable it.
     */
    xsp->tagflags = 0;
    if ((scsi_options & SCSI_OPTIONS_TAG) &&
        (devp->sd_inq->inq_rdf == RDF_SCSI2) &&
        (devp->sd_inq->inq_cmdque)) {
        if (scsi_ifsetcap(ROUTE, "tagged-qing", 1, 1) == 1) {
             xsp->tagflags = FLAG_STAG;
             xsp->throttle = 256;
        } else if (scsi_ifgetcap(ROUTE, "untagged-qing", 0) == 1) {
             xsp->dp->options |= XX_QUEUEING;
             xsp->throttle = 3;
        } else {
             xsp->dp->options &= ~XX_QUEUEING;
             xsp->throttle = 1;
        }
    }

Untagged Queueing

If tagged queueing fails, you can attempt to set untagged queuing. In this mode, you submit as many commands as you think necessary/optimal to the host adapter driver. Then, the host adapter queues the commands to the target one at a time (as opposed to tagged queueing, where the host adapter submits as many commands as it can until the target indicates that the is queue full).

Auto-Request-Sense Mode

Auto-request-sense mode is most desirable if tagged or untagged queueing is used.A contingent allegiance condition is cleared by any subsequent command and, consequently, the sense data is lost. Most HBA drivers will start the next command before performing the target driver callback. However, some HBA drivers may use a separate and lower priority thread to perform the callbacks, which may increase the time it takes to notify the target driver that the packet completed with a check condition. In this cas, the target driver may not be able to submit a request sense command in time to retrieve the sense data.
To avoid this loss of sense data, the HBA driver, or controller, should issue a request sense command as soon as a check condition has been detected; this mode is known as auto-request_sense mode. Note that not all HBA drivers are capable of auto-request-sense mode, and some can only operate with auto-request-sense mode enabled.
A target driver enables auto-request-sense mode by using scsi_ifsetcap(9F). Code Example B-2 is an example of enabling auto request sense.
Code Example B-2 Enabling auto request sense
static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
    struct xxstate *xsp;
    struct scsi_device *sdp = (struct scsi_device *)
        ddi_get_driver_private(dip);
    ...
    /*

     * enable auto-request-sense; an auto-request-sense cmd may fail
     * due to a BUSY condition or transport error. Therefore, it is
     * recommended to allocate a separate request sense packet as
     * well.
     * Note that scsi_ifsetcap(9F) may return -1, 0, or 1
     */
    xsp->sdp_arq_enabled =
        ((scsi_ifsetcap(ROUTE, "auto-rqsense", 1, 1) == 1) ? 1 : 0);
    /*
     * if the HBA driver supports auto request sense then the
     * status blocks should be sizeof (struct scsi_arq_status); else
     * one byte is sufficient
     */
    xsp->sdp_cmd_stat_size =  (xsp->sdp_arq_enabled ?
        sizeof (struct scsi_arq_status) : 1);
    ...
}

When a packet is allocated using scsi_init_pkt(9F) and auto request sense is desired on this packet then the target driver must request additional space for the status block to hold the auto request sense structure (as Code Example B-3 illustrates). The sense length used in the request sense command is sizeof (struct scsi_extended_sense).
The scsi_arq_status structure contains the following members:
    struct scsi_status sts_status;
    struct scsi_status sts_rqpkt_status;
    u_char                sts_rqpkt_reason;/* reason completion */
    u_char                sts_rqpkt_resid;/* residue */
    u_long                sts_rqpkt_state;/* state of command */
    u_long                sts_rqpkt_statistics;/* statistics */
    struct scsi_extended_sense sts_sensedata;

Auto request sense can be disabled per individual packet by just allocating sizeof (struct scsi_status) for the status block.
Code Example B-3 Allocating a packet with auto request sense
pkt = scsi_init_pkt(ROUTE, NULL, bp, CDB_GROUP1,
    xsp->sdp_cmd_stat_size, PP_LEN, 0, func, (caddr_t) xsp);

The packet is submitted using scsi_transport(9F) as usual. When a check condition occurs on this packet, the host adapter driver:
  • Issues a request sense command if the controller doesn't have auto-request-sense capability.
  • Obtains the sense data
  • Fills in the scsi_arq_status information in the packet's status block
  • Sets STATE_ARQ_DONE in the packet's pkt_state field.
  • Calls the packet's callback handler (pkt_comp)
The target driver's callback routine should verify that sense data is available by checking the STATE_ARQ_DONE bit in pkt_state which implies that a check condition has occurred and a request sense has been performed. If auto-request-sense has been temporarily disabled in a packet, there is no guarantee that the sense data can be retrieved at a later time.
The target driver should then verify whether the auto request sense command completed successfully and decode the sense data.
Code Example B-4 Checking for auto request sense
static void
xxcallback(struct scsi_pkt *pkt)
{
    ...
    if (pkt->pkt_state & STATE_ARQ_DONE) {
        /*
         * The transport layer successfully completed an
         * auto-request-sense.
         * Decode the auto request sense data here
         */
        ....
    }
    ...
}

The sample SCSI drivers in appendixes E and F show in more detail how to interpret the auto request sense data structure.