Writing Device Drivers
  Search only this book

Advanced Topics

F

This appendix contains a collection of topics. Not all drivers need to be concerned with the issues addressed.

Multithreading

This section supplements the guidelines presented in Chapter 4, "Multithreading," for writing an MT-safe driver, a driver that safely supports multiple threads.

Lock Granularity

Here are some issues to consider when deciding on how many locks to use in a driver:
  • The driver should allow as many threads as possible into the driver: this leads to fine-grained locking.
  • However, it should not spend too much time executing the locking primitives: this approach leads to coarse-grained locking.
  • Moreover, the code should be simple and maintainable.
  • Avoid lock contention for shared data.
  • Write reentrant code wherever possible. This makes it possible for many threads to execute without grabbing any locks.
  • Use locks to protect the data and not the code path.
  • Keep in mind the level of concurrency provided by the device: if the controller can only handle one request at a time, there is no point in spending a lot of time making the driver handle multiple threads.
A little thought in reorganizing the ordering and types of locks around such data can lead to considerable savings.

Avoiding Unnecessary Locks

  • Use the MT semantics of the entry points to your advantage. If an element of a device's state structure is read-mostly--for example, initialized in attach( ), and destroyed in detach( ), but only read in other entry points--there is no need to acquire a mutex to read that element of the structure. This may sound obvious, but blindly adding calls to mutex_enter(9F) and mutex_exit(9F) around every access to such a variable can lead to unnecessary locking overhead.
  • Make all entry points reentrant and reduce the amount of shared data, by changing static variables to automatic, or by adding them to your state structure.

Note - Kernel-thread stacks are small (currently 8 Kbytes), so do not allocate large automatic variables and avoid deep recursion.

Locking Order

When acquiring multiple mutexes, be sure to acquire them in the same order on each code path. For example, mutexes A and B are used to protect two resources in the following ways:
Code Path 1....Code Path 2
mutex_enter(&A);      mutex_enter(&B);
    ...                   ...
mutex_enter(&B);      mutex_enter(&A);
    ...                   ...
mutex_exit(&B);       mutex_exit(&A);
    ...                   ...
mutex_exit(&A);       mutex_exit(&B);

If thread 1 is executing code path one, and thread two is executing code path 2, the following could occur:
  1. Thread one acquires mutex A.

  2. Thread two acquires mutex B.

  3. Thread one needs mutex B, so it blocks holding mutex A.

  4. Thread two needs mutex A, so it blocks holding mutex B.

These threads are now deadlocked. This is hard to track down, and usually even more so since the code paths are rarely so straightforward. Also, it doesn't always happen, as it depends on the relative timing of threads one and two.

Scope of a Lock

Experience has shown that it is easier to deal with locks that are either held throughout the execution of a routine, or locks that are both acquired and released in one routine. Avoid nesting like this:
        static void
        xxfoo(...)
        {
             mutex_enter(&softc->lock);
             ...
             xxbar();
        }
        static void
        xxbar(...)
        {
             ...
             mutex_exit(&softc->lock);
        }

This example works, but will almost certainly lead to maintenance problems.
If contention is likely in a particular code path, try to hold locks for a short time. In particular, arrange to drop locks before calling kernel routines that might block. For example:
mutex_enter(&softc->lock);
...
softc->foo = bar;

             softc->thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
             ...
             mutex_exit(&softc->lock);

This is better coded as:
thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
mutex_enter(&softc->lock);
...
softc->foo = bar;
softc->thingp = thingp;
...
mutex_exit(&softc->lock);

Potential Panics

Here is a set of mutex-related panics:
panic: recursive mutex_enter. mutex %x caller %x

  Mutexes are not reentrant by the same thread. If you already own the
  mutex, you cannot own it again. Doing this leads to the above panic.

panic: mutex_adaptive_exit: mutex not held by thread

Releasing a mutex that the current thread does not hold causes the above panic.
panic: lock_set: lock held and only one CPU

This only occurs on a uniprocessor, and says that a spin mutex is held and it would spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.
A common cause of this panic is that the device's interrupt is high-level (see ddi_intr_hilevel(9F) and Intro(9F)), and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but may not be so if it's blocking while grabbing an adaptive mutex with mutex_enter(9F).

Note - In principle, this is only a problem for drivers that operate above lock level.

Sun Disk Device Drivers

Sun disk devices represent an important class of block device drivers. A Sun disk device is one that is supported by disk utility commands such as format(1M) and newfs(1M).

Disk I/O Controls

Sun disk drivers need to support a minimum set of I/O controls specific to Sun disk drivers. These I/O controls are specified in the dkio(7) manual page. Disk I/O controls transfer disk information to or from the device driver. In the case where data is copied out of the driver to the user, ddi_copyout(9F) should be used to copy the information into the user's address space. When data is copied to the disk from the user, the ddi_copyin(9F) should be used to copy data into the kernels address space. Table F-1 lists the mandatory Sun disk I/O controls.
Table F-1
I/O ControlDescription
DKIOCINFOReturn information describing the disk controller.
DKIOCGAPARTReturn a disk's partition map.
DKIOCSAPARTSet a disk's partition map.
DKIOCGGEOMReturn a disk's geometry.
DKIOCSGEOMSet a disk's geometry.
DKIOCGVTOCReturn a disk's Volume Table of Contents.
DKIOCSVTOCSet a disk's Volume Table of Contents.
Sun disks may also support a number of optional ioctls listed in the hdio(7) manual page. Table F-2 lists optional Sun disk ioctls:
Table F-2
I/O ControlDescription
HDKIOCGTYPEReturn the disk's type.
HDKIOCSTYPESet the disk's type.
Table F-2
I/O ControlDescription
HDKIOCGBADReturn the bad sector map of the device.
HDKIOCSBADSet the bad sector map for the device.
HDKIOCGDIAGReturn the diagnostic information regarding the most recent command.

Disk Performance

The Solaris 2.x DDI/DKI provides facilities to optimize I/O transfers for improved file system performance. It supports a mechanism to manage the list of I/O requests so as to optimize disk access for a file system. See "Asynchronous Data Transfers" on page 201 for a description of enqueuing an I/O request.
The diskhd structure is used to manage a linked list of I/O requests.
struct diskhd {
   long                     b_flags;     /* not used, needed for */
                                         /* consistency          */
   struct buf *b_forw,      *b_back;     /* queue of unit queues */
   struct buf *av_forw, *av_back;        /* queue of bufs for this unit */
   long                     b_bcount;    /* active flag */
};

The diskhd data structure has two buf pointers which can be manipulated by the driver. The av_forw pointer points to the first active I/O request. The second pointer, av_back points to the last active request on the list.
A pointer to this structure is passed as an argument to disksort(9F) along with a pointer to the current buf structure being processed. The disksort(9F) routine is used to sort the buf requests in a fashion that optimizes disk seek and then inserts the buf pointer into the diskhd list. The disksort program uses the value that is in b_resid of the buf structure as a sort key. It is up to the driver to set this value. Most Sun disk drivers use the cylinder group as the sort key. This tends to optimize the file system read-ahead accesses.
Once data has been added to the diskhd list, the device needs to transfer the data. If the device is not busy processing a request, the xxstart( ) routine pulls the first buf structure off the diskhd list and starts a transfer.
If the device is busy, the driver should return from the xxstrategy( ) entry point. Once the hardware is done with the data transfer, it generates an interrupt. The driver's interrupt routine is then called to service the device. After servicing the interrupt, the driver can then call the start( ) routine to process the next buf structure in the diskhd list.

SCSA

Global Data Definitions

The following is information for debugging, useful when a driver runs into bus-wide problems. There is one global data variable that has been defined for the SCSA implementation: scsi_options. This variable is a SCSA configuration longword used for debug and control. The defined bits in the scsi_options longword can be found in the file <sys/scsi/conf/autoconf.h>, and have the following meanings when set:
Table F-3
OptionDescription
SCSI_OPTIONS_DRenable global disconnect/reconnect
SCSI_OPTIONS_SYNCenable global synchronous transfer capability
SCSI_OPTIONS_PARITYenable global parity support
SCSI_OPTIONS_TAGenable global tagged queuing support
SCSI_OPTIONS_FASTenable global FAST SCSI support: 10MB/sec transfers, as opposed to 5 MB/sec
SCSI_OPTIONS_WIDEenable global WIDE SCSI

Note - The setting of scsi_options affects all host adapter and target drivers present on the system(as opposed to scsi_ifsetcap(9F)). Refer to scsi_hba_attach(9F) in the Solaris 2.5 Reference Manual AnswerBook for information on controlling these options for a particular host adapter.

The default setting for scsi_options has these values set:
  • SCSI_OPTIONS_DR
  • SCSI_OPTIONS_SYNC
  • SCSI_OPTIONS_PARITY
  • SCSI_OPTIONS_TAG
  • SCSI_OPTIONS_FAST
  • SCSI_OPTIONS_WIDE

Tagged Queueing

For a definition of tagged queueing refer to the SCSI-2 specification. To support tagged queueing, first check the scsi_options flag SCSI_OPTIONS_TAG to see if tagged queueing is enabled globally. Next, check to see if the target is a SCSI-2 device and whether it has tagged queueing enabled. If this is all true, attempt to enable tagged queueing by using scsi_ifsetcap(9F). Code Example F-1 shows an example of supporting tagged queueing.
Code Example F-1 Supporting SCSI Tagged Queueing
#define ROUTE &sdp->sd_address
    ...
    /*
     * If SCSI-2 tagged queueing is supported by the disk drive and
     * by the host adapter then we will enable it.
     */
    xsp->tagflags = 0;
    if ((scsi_options & SCSI_OPTIONS_TAG) &&
        (devp->sd_inq->inq_rdf == RDF_SCSI2) &&
        (devp->sd_inq->inq_cmdque)) {
        if (scsi_ifsetcap(ROUTE, "tagged-qing", 1, 1) == 1) {
             xsp->tagflags = FLAG_STAG;
             xsp->throttle = 256;
        } else if (scsi_ifgetcap(ROUTE, "untagged-qing", 0) == 1) {
             xsp->dp->options |= XX_QUEUEING;
             xsp->throttle = 3;
        } else {
             xsp->dp->options &= ~XX_QUEUEING;
             xsp->throttle = 1;
        }
    }

Untagged Queueing

If tagged queueing fails, you can attempt to set untagged queuing. In this mode, you submit as many commands as you think necessary/optimal to the host adapter driver. Then, the host adapter queues the commands to the target one at a time (as opposed to tagged queueing, where the host adapter submits as many commands as it can until the target indicates that the is queue full).