Writing Device Drivers
  Suchtext Nur in diesem Buch
Dieses Buch im PDF-Format herunterladen

DMA

7

Many devices can temporarily take control of the bus and perform data transfers to (and from) main memory or other devices. Since the device is doing the work without the help of the CPU, this type of data transfer is known as a direct memory access (DMA). DMA transfers can be performed between two devices, between a device and memory, or between memory and memory. This chapter describes transfers between a device and memory only.

The DMA Model

The Solaris 2.x DDI/DKI provides a high-level, architecture-independent model for DMA. This allows the framework (the DMA routines) to hide architecture-specific details such as:
  • Setting up DMA mappings
  • Building scatter-gather lists.
  • Ensuring I/O and CPU caches are consistent.
There are several abstractions that are used in the DDI/DKI to describe aspects of a DMA transaction. These include:
  • DMA Object

    Memory that is the source or destination of a DMA transfer.

  • DMA Handle
An opaque object returned from a successful DMA setup call. The DMA handle can be used in successive DMA subroutine calls to refer to the DMA object.
  • DMA Window

    A DMA window describes all or a portion of a DMA object that is ready to accept data transfers.

  • DMA Segment

    A DMA segment is a contiguous portion of a DMA window that is entirely addressable by the device.

  • DMA Cookie

    A ddi_dma_cookie(9S) structure (ddi_dma_cookie_t) describes a DMA segment. It contains DMA addressing information required to program the DMA engine.

Rather than knowing that a platform needs to map an object (typically a memory buffer) into a special DMA area of the kernel address space, device drivers instead allocate DMA resources for the object. The DMA routines then perform any platform-specific operations needed to set the object up for DMA access. The driver receives a DMA handle to identify the DMA resources allocated for the object. This handle is opaque to the device driver; the driver must save the handle and pass it in subsequent calls to DMA routines, but should not interpret it in any way.
Operations are defined on a DMA handle that provide the following services:
  • Manipulating DMA resources
  • Synchronizing DMA objects
  • Retrieving attributes of the allocated resources
Figure 7-1 shows the relationship between the DMA object, the DMA handle, and the DMA windows, segments, and cookies.

Grafik

Figure 7-1

Types of Device DMA

Devices may perform one of the following three types of DMA:
Bus Master DMA If the device is capable of acting as a true bus master, then the driver should program the device's DMA registers directly. The transfer address and count is obtained from the cookie and given to the device.
Devices on current SPARC platforms use this form of DMA exclusively.
Third-party DMA Third-party DMA utilizes a system DMA engine resident on the main system board, which has several DMA channels available for use by devices. The device relies on the system's DMA engine to perform the data transfers between the device and memory. The driver uses DMA engine routines (see ddi_dmae(9F)) to initialize and program the DMA engine. For each DMA data transfer, the driver programs the DMA engine and then gives the device a command to initiate the transfer in cooperation with that engine.
First-party DMA Under first-party DMA, the device drives its own DMA bus cycles using a channel from the system's DMA engine. The ddi_dmae_1stparty(9F) function is used to configure this channel in a cascade mode such that the DMA engine will not interfere with the transfer.

DMA and DVMA

The platform that the device operates on may provide one of two types of memory access: Direct Memory Access (DMA) or Direct Virtual Memory Access (DVMA).
On platforms that support DMA, the device is provided with a physical address by the system in order to perform transfers. In this case, one logical transfer may actually consist of a number of physically discontiguous transfers. An example of this occurs when an application transfers a buffer that spans several contiguous virtual pages that map to physically discontiguous pages. In order to deal with the discontiguous memory, devices for these platforms
usually have some kind of scatter/gather DMA capability. Typically the system that supports x86 platforms provides physical addresses for direct memory transfers.
On platforms that support DVMA, the device is provided with a virtual address by the system in order to perform transfers. In this case, the underlying platform provides some form of MMU which translates device accesses to these virtual addresses into the proper physical addresses. The device transfers to and from a contiguous virtual image that may be mapped to discontiguous virtual pages. Devices that operate in these platforms don't need scatter/gather DMA capability. Typically the system which supports SPARC platforms provides virtual addresses for direct memory transfers.

Handles, Windows, Segments and Cookies

A DMA handle is an opaque pointer representing an object (usually a memory buffer or address) where a device can perform DMA transfer. The handle is used in several different calls to DMA routines to identify the DMA resources allocated for the object.
An object represented by a DMA handle is completely covered by one or more DMA windows. The system uses the information in the DMA limit structure, and the memory location and alignment of the target object, to decide how to divide an object into multiple windows in order to fit the request within system resource limitations. The ddi_dma_nextwin(9F) function takes a DMA handle obtained from a DMA setup function and a previous window (or NULL for the first window) and passes back the next (or first) window of the object. An active DMA window may represent allocated resources, such as intermediate buffers. The resources will be released upon the next call to ddi_dma_nextwin(9F) or when the DMA resources are freed using ddi_dma_free(9F).
A DMA window can span several discontiguous pages of system memory. If the DMA engine does not have a memory map, a DMA window might have to be broken into multiple DMA segments, each representing a contiguous piece of memory to or from which the DMA engine can transfer data. The ddi_dma_nextseg(9F) function takes a DMA window obtained from ddi_dma_nextwin(9F) and a previous segment (or NULL for the first segment) and returns the next (or first) segment in the window. A segment represents a contiguous object that is completely addressable in one DMA cookie.
The DMA cookie is a data structure that contains information (such as the transfer address and count) needed to program the DMA engine (see ddi_dma_cookie(9S)). The ddi_dma_segtocookie(9F) function takes a DMA segment obtained from ddi_dma_nextseg(9F) and passes back a DMA cookie for that segment.

Scatter/Gather

Some DMA engines may be able to accept more than one cookie. Such engines can perform scatter/gather I/O without the help of the system. In this case, it is most efficient if the driver uses ddi_dma_nextseg(9F) and ddi_dma_segtocookie(9F) to get as many cookies as the DMA engine can handle and program them all into the engine. The device can then be programmed to transfer the total number of bytes covered by all these segments combined.

DMA Operations

The steps involved in a DMA transfer are similar among the types of DMA.
Bus-master DMA In general, here are the steps that must be followed to perform bus-master DMA.
  1. Describe the device limitations. This allows the routines to ensure that the device will be able to access the buffer.

  2. Lock the DMA objects in memory (see physio(9F).


Note - This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

  1. Allocate DMA resources for the object.

  2. Retrieve the next DMA window with ddi_dma_nextwin(9F).

  3. Retrieve the next segment in the window with ddi_dma_nextseg(9F).

  4. Get a DMA cookie for the segment with ddi_dma_segtocookie(9F).

  1. Program the DMA engine on the device and start it (this is device-specific).

When the transfer is complete, continue the bus master operation:
  1. Perform any required object synchronizations.

  2. Transfer the rest of the window by repeating from Step 5.

  1. Transfer the rest of the object by repeating from Step 4.

  1. Release the DMA resources.

    First-party DMA In general, here are the steps that must be performed to perform first-party DMA.

  1. Allocate a DMA channel.

  2. Configure the channel with ddi_dmae_1stparty(9F).

  3. Lock the DMA objects in memory. This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

  4. Allocate DMA resources for the object.

  5. Retrieve the next DMA window with ddi_dma_nextwin(9F).

  6. Retrieve the next segment in the window with ddi_dma_nextseg(9F).

  7. Get a DMA cookie for the segment with ddi_dma_segtocookie(9F).

Program the DMA engine and start it. When the transfer is complete, continue the DMA operation:
  1. Perform any required object synchronizations.

  2. Transfer the rest of the window by repeating from Step 6.

  1. Transfer the rest of the object by repeating from Step 5.

  1. Release the DMA resources.

  2. Deallocate the DMA channel.

Third-party DMA In general, here are the steps that must be performed to perform third-party DMA.
  1. Allocate a DMA channel.

  2. Retrieve the system's DMA engine limitations with ddi_dmae_getlim(9F).

  3. Lock the DMA objects in memory. This step is not necessary in block drivers for buffers coming from the file system, as the file system has already locked the data in memory.

  4. Allocate DMA resources for the object.

  5. Retrieve the next DMA window with ddi_dma_nextwin(9F).

  6. Retrieve the next segment in the window with ddi_dma_nextseg(9F).

  7. Get a DMA cookie for the segment with ddi_dma_segtocookie(9F).

  8. Program the system DMA engine to perform the transfer with ddi_dmae_prog(9F).

  9. Perform any required object synchronizations.

  1. Transfer the rest of the window by repeating from Step 6.

  1. Transfer the rest of the object by repeating from Step 5.

  2. Stop the DMA engine with ddi_dmae_stop(9F).

  3. Release the DMA resources.

  4. Deallocate the DMA channel.

    Certain hardware platforms may restrict DMA capabilities in a bus-specific way. Drivers should use ddi_slaveonly(9F) to determine if the device is in a slot in which DMA is possible. For an example, see the attach() section on page 95.

Device limitations

Device limitations describe the built-in restrictions of a DMA engine. These limits include:
  • Limits on addresses the device can access
  • Maximum transfer count
  • Address alignment restrictions
To ensure that DMA resources allocated by the system can be accessed by the device's DMA engine, device drivers must inform the system of their DMA engine limitations using a ddi_dma_lim(9S) structure. The system may impose additional restrictions on the device attributes, but it never removes any of the driver-supplied restrictions.

DMA Limits

All DMA resource-allocation routines take a pointer to a DMA limit structure as an argument (see Code Example 7-1 on page 134). This structure is currently processor-architecture dependant.

ddi_dma_lim_sparc

The SPARC DMA limit structure contains the following members:
typedef struct ddi_dma_lim_t {
    u_long dlim_addr_lo;      /* lower bound of address range */
    u_long dlim_addr_hi;      /* inclusive upper bound of address range */
    u_int   dlim_minxfer;     /* minimum effective DMA transfer size */
    u_int   dlim_cntr_max; /* inclusive upper bound of address register */
    u_int   dlim_burstsizes;/* bitmask encoded DMA burst sizes */
    u_int   dlim_dmaspeed; /* average DMA data rate (KB/s) */
} ddi_dma_lim_t;

dlim_addr_lo is the lowest address that the DMA engine can access.
dlim_addr_hi is the highest address that the DMA engine can access.
dlim_minxfer is the minimum effective transfer size the device can perform. It also influences alignment and padding restrictions.
dlim_cntr_max is the upper bound of the DMA engine's address register. This is often used where the upper 8 bits of an address register are a latch containing a segment number, and the lower 24 bits are used to address a segment. In this case, dlim_cntr_max would be set to 0x00FFFFFF; this prevents the system from crossing a 24-bit segment boundary when establishing mappings to the object.
dlim_burstsizes specifies the burst sizes that the device supports. A burst size is the amount of data the device can transfer before relinquishing the bus. This member is a bitmask encoding of the burst sizes. For example, if the device is capable of doing 1, 2, 4, and 16 byte bursts, this field should be set to 0x17. The system also uses this field to determine alignment restrictions. If the device is an SBus device and can take advantage of a 64-bit SBus, the lower 16 bits are used to specify the burst size for 32-bit transfers, and the upper 16 bits are used to specify the burst size for 64-bit transfers.
dlim_dmaspeed is the average speed of the DMA engine in KBytes/second. This is intended to be a hint for the resource allocation routines, but is optional and may be zero.

ddi_dma_lim_x86

The x86 DMA limit structure contains the following members:
typedef struct ddi_dma_lim {
    u_long dlim_addr_lo;      /* lower bound of address range */
    u_long dlim_addr_hi;      /* inclusive upper bound of address range */
    u_int   dlim_cntr_max; /* set to 0 */
    u_int   dlim_burstsizes;/* set to 1 */
    u_int   dlim_minxfer;     /* minimum DMA transfer size */
    u_int   dlim_dmaspeed; /* set to 0 */
    u_int   dlim_version;     /* version number of this structure */
    u_int   dlim_adreg_max;/* inclusive upper bound of incrementing address */
                               /* register */
    u_int   dlim_ctreg_max;/* maximum transfer count - 1 */
    u_int   dlim_granular; /* granularity of transfer count */
    u_int   dlim_sgllen;      /* length of DMA scatter/gather list */
    u_int   dlim_reqsize;     /* maximum transfer size (bytes) of a single I/O */
} ddi_dma_lim_t;

dlim_addr_lo is the lowest address that the DMA engine can access.
dlim_addr_hi is the highest address that the DMA engine can access.
dlim_minxfer is the minimum transfer size the DMA engine can perform. It also influences alignment and padding restrictions. It should be set to DMA_UNIT_8, DMA_UNIT_16, or DMA_UNIT_32 to indicate 1, 2, or 4 byte transfers.
dlim_version specifies the version number of this structure. It should be set to DMALIM_VER0.
dlim_adreg_max is the upper bound of the DMA engine's address register. This is often used where the upper 8 bits of an address register are a latch containing a segment number, and the lower 24 bits are used to address a segment. In this case, dlim_cntr_max would be set to 0x00FFFFFF; this prevents the system from crossing a 24-bit segment boundary when establishing mappings to the object.
dlim_ctreg_max specifies the maximum transfer count that the DMA engine can handle in one segment or cookie. The limit is expressed as the maximum count minus one. This transfer count limitation is a per-segment limitation. It is used as a bit mask, so it must also be one less than a power of two.
dlim_granular field describes the granularity of the device's DMA transfer ability, in units of bytes. This value is used to specify, for example, the sector size of a mass storage device. DMA requests will be broken into multiples of this value. If there is no scatter/gather capability, then the size of each DMA transfer will be a multiple of this value. If there is scatter/gather capability, then a single segment will not be smaller than the minimum transfer value, but may be less than the granularity; however the total transfer length of the scatter/gather list will be a multiple of the granularity value.
dlim_sgllen specifies the maximum number of entries in the scatter/gather list. It is the number of segments or cookies that the DMA engine can consume in one I/O request to the device. If the DMA engine has no scatter/gather list, this field should be set to one.
dlim_reqsize describes the maximum number of bytes that the DMA engine can transmit or receive in one I/O command. This limitation is only significant if it is less than (dlim_ctreg_max +1) * dlim_sgllen. If the DMA engine has no particular limitation, this field should be set to 0xFFFFFFFF.
Here are some examples specifying device limitations:

Example One

A DMA engine on a SPARC SBus device has the following limitations:
  • It can only access addresses ranging from 0xFF000000 to 0xFFFFFFFF.
  • It has a 32-bit address register.
  • It supports 1, 2 and 4-byte burst sizes.
  • It has a minimum effective transfer size of 1 byte.
  • The system should not make optimizations related to transfer speed.
The average speed is not known, so the dlim_dmaspeed field is set to zero. The resulting limit structure is:
static ddi_dma_lim_t limits = {
    0xFF000000,      /* low address */
    0xFFFFFFFF,      /* high address */
    0xFFFFFFFF,      /* address register maximum */
    0x7,             /* burst sizes: 0x1 | 0x2 | 0x4 */
    0x1,             /* minimum transfer size */
    0                /* speed */
};

Example Two

A DMA engine on a SPARC VMEbus device has the following limitations:
  • It can address the full 32-bit range.
  • It has a 24-bit address register.
  • It supports 2 to 256-byte burst sizes and all powers of 2 in between.
  • It has a minimum effective transfer size of 2 bytes.
  • It has an average transfer speed of 10 Mbytes per second.
The resulting limit structure is:
static ddi_dma_lim_t limits = {
    0x00000000,      /* low address */
    0xFFFFFFFF,      /* high address */
    0xFFFFFF,        /* address register maximum */
    0x1FE,           /* burst sizes */
    2,               /* minimum transfer size */
    10240            /* speed */
};

Example Three

A DMA engine on an x86 ISA bus device has the following limitations:
  • It only access the first 16 megabytes of memory.
  • It can perform transfers to segments up to 32k in size.
  • It can hold up to 17 scatter/gather transfers.
  • It operates on units of 512 bytes.
  • It has a minimum effective transfer size of 2 bytes.
  • It has an average transfer speed of 10 Mbytes per second.
The resulting limit structure is:
static ddi_dma_lim_t limits = {
    0x00000000,      /* low address */
    0x00FFFFFF,      /* high address */
    0,               /* must be 0 */
    1,               /* must be 1 */
    DMA_UNIT_8,      /* minimum transfer size */
    0                /* must be 0 */
    DMALIM_VER0,     /* version */
    0xFFFFFF,        /* address register maximum */
    0x007FFF,        /* maximum transfer - 1 */
    512,             /* granularity */
    17,              /* scatter/gather length */
    0xFFFFFFFF       /* request size */
};

Object Locking

Before allocating the DMA resources for a memory object, the object must be prevented from moving. If it is not, the system may remove the object from memory while the device is writing to it, causing the data transfer to fail and possibly corrupting the system. The process of preventing memory objects from moving during a DMA transfer is known as locking down the object.

Note - Locking objects in memory is not related to the type of locking used to protect data.

The following object types do not require explicit locking:
  • Buffers coming from the file system through strategy(9E). These buffers are already locked by the file system.
  • Kernel memory allocated within the device driver, such as that allocated by ddi_mem_alloc(9F) or ddi_iopb_alloc(9F).
For other objects (such as buffers from user space), physio(9F) must be used to lock down the objects. This is usually performed in the read(9E) or write(9E) routines of a character device driver. See "DMA Transfers" on page 158 for an example.

Allocating DMA Resources

Two interfaces are recommended for allocating DMA resources:
ddi_dma_buf_setup(9F)        Recommended for use with buffer structures.

ddi_dma_addr_setup(9F) Recommended for use with virtual addresses.
Table 7-1 lists the appropriate DMA resource allocation interfaces for different classes of DMA objects.
Table 7-1
Type of ObjectResource Allocation Interface
Memory allocated within the driver using ddi_mem_alloc(9F), or ddi_iopb_alloc(9F)ddi_dma_addr_setup(9F)
Requests from the file system through strategy(9E)ddi_dma_buf_setup(9F)
Memory in user space that has been locked down using physio(9F)ddi_dma_buf_setup(9F)
All resource allocation routines return a DMA handle for use in subsequent calls to DMA-related functions. DMA resources are usually allocated in the driver's xxstart( ) routine, if it has one. See "Asynchronous Data Transfers" on page 184 for discussion of xxstart( ).
int ddi_dma_addr_setup(dev_info_t *dip,
    struct as *as, caddr_t addr,
    u_int len, u_int flags, int (*waitfp)(caddr_t), caddr_t arg,
    ddi_dma_lim_t *lim, ddi_dma_handle_t *handlep);
int ddi_dma_buf_setup(dev_info_t *dip,
    struct buf *bp,
    u_int flags, int (*waitfp)(caddr_t), caddr_t arg,
    ddi_dma_lim_t *lim, ddi_dma_handle_t *handlep);

ddi_dma_addr_setup(9F) and ddi_dma_buf_setup(9F) take the following two arguments:
dip is a pointer to the device's dev_info structure.
the object to allocate resources for
For ddi_dma_addr_setup(9F), the object is described by an address range:
  • as is a pointer to an address space structure (this must be NULL).
  • addr is the base kernel address of the object.
  • len is the length of the object.
For ddi_dma_buf_setup(9F), the object is described by a buf(9S) structure:
  • bp is a pointer to a buf(9S) structure.
flags is a set of flags indicating the transfer direction and other attributes. DDI_DMA_READ indicates a data transfer from device to memory. DDI_DMA_WRITE indicates a data transfer from memory to device. See ddi_dma_req(9S) for a complete discussion of the allowed flags.
waitfp is the address of callback function for handling resource allocation failures.
arg is the argument to pass to the callback function.
lim is a pointer to a ddi_dma_lim(9S) structure as described in "Device limitations" on page 128.
handlep is a pointer to DMA handle (to store the returned handle).

Handling Resource Allocation Failures

The resource-allocation routines provide the driver several options when handling allocation failures. The waitfp argument indicates whether the allocation routines will block, return immediately, or schedule a callback.
waitfpIndicated Action
DDI_DMA_DONTWAITDriver does not wish to wait for resources to become available.
DDI_DMA_SLEEPDriver is willing to wait indefinitely for resources to become available.
Other valuesThe address of a function to be called when resources are likely to be available.

State Structure

This section adds the following fields to the state structure. See "State Structure" on page 57 for more information.
struct buf            *bp;         /* current transfer */
ddi_dma_handle_t      handle;
struct xxiopb         *iopb_array;/* for I/O Parameter Blocks */
ddi_dma_handle_t      iopb_handle;

Device Register Structure

Devices that do DMA have more registers than have been used in previous examples. This section adds the following fields to the device register structure to support DMA-capable device examples:
volatile caddr_t      dma_addr;    /* starting address for DMA */
volatile u_int        dma_size;    /* amount of data to transfer */
volatile caddr_t      iopb_addr; /* When written informs device of the next */

/* command's parameter block address. */ /* When read after an interrupt, contains */ /* the address of the completed command. */

Callback Example

In Code Example 7-1 xxstart( ) is used as the callback function and the per-device state structure is given as its argument. xxstart( ) attempts to start the command. If the command cannot be started because resources are not available, xxstart( ) is scheduled to be called sometime later, when resources might be available.
Since xxstart( ) is used as a DMA callback, it must follow these rules imposed on DMA callbacks:
  • It must not assume that resources are available (it must try to allocate them again).
  • It must indicate to the system whether allocation succeed by returning 0 if it fails to allocate resources (and needs to be called again later) or 1 indicating success (so no further callback is necessary).
See ddi_dma_req(9S) for a discussion of DMA callback responsibilities.
Code Example 7-1 Allocating DMA resources
static int
xxstart(caddr_t arg)
{

    struct xxstate *xsp = (struct xxstate *) arg;
    struct device_reg *regp;
    int flags;
    mutex_enter(&xsp->mu);
    if (xsp->busy) {
        /* transfer in progress */
        mutex_exit(&xsp->mu);
        return (0);
    }
    xsp->busy = 1;
    mutex_exit(&xsp->mu);
    regp = xsp->regp;
    if (transfer is a read) {
        flags = DDI_DMA_READ;
    } else {
        flags = DDI_DMA_WRITE;
    }
    if (ddi_dma_buf_setup(xsp->dip, xsp->bp, flags, xxstart,
        (caddr_t)xsp, &limits, &xsp->handle) != DDI_DMA_MAPPED) {
        /* really should check all return values in a switch */
        return (0);
    }
    ...
    program the DMA engine
    ...
    return (1);
}

Burst Sizes

SPARC device drivers specify the burst sizes their device supports in the dlim_burstsizes field of the ddi_dma_lim(9S) structure. This is a bitmap of the supported burst sizes. However, when DMA resources are allocated, the system might impose further restrictions on the burst sizes that may actually be used by the device. The ddi_dma_burstsizes(9F) routine can be used to obtain the allowed burst sizes. It returns the appropriate burst size bitmap for the device. When DMA resources are allocated, a driver can ask the system for appropriate burst sizes to use for its DMA engine.
#define BEST_BURST_SIZE 0x20 /* 32 bytes */

if (ddi_dma_buf_setup(xsp->dip, xsp->bp, flags, xxstart,
    (caddr_t)xsp, &limits, &xsp->handle) != DDI_DMA_MAPPED) {

        /* error handling */
        return (0);
}
burst = ddi_dma_burstsizes(xsp->handle);

/* check which bit is set and choose one burstsize to program the DMA engine */ if (burst & BEST_BURST_SIZE) {
program DMA engine to use this burst size
} else {

other cases
}

Programming the DMA Engine

When the resources have been successfully allocated, the driver traverses the returned DMA window and finds the first segment. Code Example 7-2 is a simple example of this.
Code Example 7-2 Traversing windows and segments
ddi_dma_win_t    win, nwin;
ddi_dma_seg_t    seg, nseg;
int     retw, rets;
for (win = NULL;
    (retw = ddi_dma_nextwin(xsp->handle,win,&nwin))!=DDI_DMA_DONE;
    win = nwin) {
    if (retw != DDI_SUCCESS) {
        /* do error handling */
    } else {
        for (seg = NULL;
             (rets = ddi_dma_nextseg(nwin,seg,&nseg))!=DDI_DMA_DONE;
             seg = nseg) {
             if (rets != DDI_SUCCESS) {
                 /* do error handling */
             } else {
                 ddi_dma_segtocookie(nseg, &off, &len, &cookie);
                 program the DMA engine
             }
        }
    }
}

The device must then be programmed to transfer this segment. Although programming a DMA engine is device specific, all DMA engines require a starting address and a transfer count. Device drivers retrieve these two values from a given segment by calling ddi_dma_segtocookie(9F).
This function takes the segment and fills in a DMA cookie and the offset and length of the segment. A cookie is of type ddi_dma_cookie(9S) and has the following fields:
unsigned long    dmac_address;     /* unsigned 32 bit address */
u_int            dmac_size;        /* unsigned 32 bit size */
u_int            dmac_type;        /* bus-specific type bits */

Upon return from ddi_dma_segtocookie(9F), the dmac_address field of the cookie contains the DMA transfer's starting address and dmac_size contains the transfer count. Depending on the bus architecture, the third field in the cookie may be required by the driver.
The exact shape of dmac_address is device specific--the driver should know how to interpret it. There is an implementation-specific agreement on the shape of the cookie. The driver should not perform any manipulations, such as logical or arithmetical, on the cookie.

Freeing the DMA Resources

After a DMA transfer completes (usually in the interrupt routine), the DMA resources may be released by calling ddi_dma_free(9F).
As described in "Synchronizing Memory Objects" on page 142," ddi_dma_free(9F) calls ddi_dma_sync(9F), eliminating the need for any explicit synchronization. After calling ddi_dma_free(9F), the DMA handle becomes invalid, and further references to the handle have undefined results. Code Example 7-3 shows how to use ddi_dma_free(9F).
Code Example 7-3 Freeing DMA resources
static u_int
xxintr(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    u_char status, temp;
    mutex_enter(&xsp->mu);
    /* read status */
    status = xsp->regp->csr;

    if (!(status & INTERRUPTING)) {
        mutex_exit(&xsp->mu);
        return (DDI_INTR_UNCLAIMED);
    }
    xsp->regp->csr = CLEAR_INTERRUPT;
    /* for store buffers */
    temp = xsp->regp->csr;
    ddi_dma_free(xsp->handle);
    ...
    check for errors
    ...
    xsp->busy = 0;
    mutex_exit(&xsp->mu);

if (pending transfers) {
        (void) xxstart((caddr_t) xsp);
    }
    return (DDI_INTR_CLAIMED);
}

The DMA resources should be released and reallocated if a different object is used in the next transfer. However, if the same object is always used, the resources may be allocated once and continually reused as long as there are intervening calls to ddi_dma_sync(9F).

Cancelling DMA Callbacks

DMA callbacks cannot be cancelled. This requires some additional code in the drivers detach(9E) routine, since it must not return DDI_SUCCESS if there are any outstanding callbacks. When DMA callbacks occur, the detach(9E) routine must wait for the callback to run and must prevent it from rescheduling itself. This can be done using additional fields in the state structure:
int     cancel_callbacks;      /* detach(9E) sets this to */
                               /* prevent callbacks from */
                               /* rescheduling themselves */
int     callback_count;        /* number of outstanding callbacks */
kmutex_t   callback_mutex;         /* protects callback_count and */
                               /* cancel_callbacks. */
kcondvar_t callback_cv;        /* condition is that callback_count */
                               /* is zero. detach(9E) waits on it */

Code Example 7-4 Cancelling DMA callbacks
static int
xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
    ...
    mutex_enter(&xsp->callback_mutex);
    xsp->cancel_callbacks = 1;
    while (xsp->callback_count > 0) {
        cv_wait(&xsp->callback_cv, &xsp->callback_mutex);
    }
    mutex_exit(&xsp->callback_mutex);
    ...
}
static int
xxstrategy(struct buf *bp)
{
    ...
    mutex_enter(&xsp->callback_mutex);
    xsp->bp = bp;
    error = ddi_dma_buf_setup(xsp->dip, xsp->bp,
         flags, xxdmacallback, (caddr_t)xsp, &limits, &xsp->handle);
    if (error == DDI_DMA_NORESOURCES)
        xsp->callback_count++;
    mutex_exit(&xsp->callback_mutex);
    ...
}
static int
xxdmacallback(caddr_t callbackarg)
{
    struct xxstate *xsp = (struct xxstate *)callbackarg;
    ...
    mutex_enter(&xsp->callback_mutex);
    if (xsp->cancel_callbacks) {
        /* do not reschedule, in process of detaching */
        xsp->callback_count--;
        if (xsp->callback_count == 0)
             cv_signal(&xsp->callback_cv);
        mutex_exit(&xsp->callback_mutex);
        return (1);      /* tell framework for this callback */
                          /* routine not to reschedule it */
    }
    /*
     * Presumably at this point the device is still active
     * and will not be detached until the DMA has completed.

     * A return of 0 means try again later
     */
    error = ddi_dma_buf_setup(xsp->dip, xsp->bp,
        flags, DDI_DMA_DONTWAIT, NULL, &limits, &xsp->handle);
    if (error == DDI_DMA_MAPPED) {
        ...
        program the DMA engine
        ...
        xsp->callback_count--;
        mutex_exit(&xsp->callback_mutex);
        return (1);
    }
    if (error != DDI_DMA_NORESOURCES) {
        xsp->callback_count--;
        mutex_exit(&xsp->callback_mutex);
        return (1);
    }
    mutex_exit(&xsp->callback_mutex);
    return (0);
}

Synchronizing Memory Objects

At various points when the memory object is accessed (including the time of removal of the DMA resources), the driver may need to synchronize the memory object with respect to various caches. This section gives guidelines on when and how to synchronize memory objects.

Cache

Cache is a very high-speed memory that sits between the CPU and the system's main memory (CPU cache), or between a device and the system's main memory (I/O cache).

Grafik

Figure 7-2

When an attempt is made to read data from main memory, the associated cache first checks to see if it contains the requested data. If so, it very quickly satisfies the request. If the cache does not have the data, it retrieves the data from main memory, passes the data on to the requestor, and saves the data in case that data is requested again.
Similarly, on a write cycle, the data is stored in the cache very quickly and the CPU or device is allowed to continue executing (transferring). This takes much less time than it otherwise would if the CPU or device had to wait for the data to be written to memory.
An implication of this model is that after a device transfer has completed, the data may still be in the I/O cache but not yet in main memory. If the CPU accesses the memory, it may read the wrong data from the CPU cache. To ensure a consistent view of the memory for the CPU, the driver must call a synchronization routine to write the data from the I/O cache to main memory and update the CPU cache with the new data. Similarly, a synchronization step is required if data modified by the CPU is to be accessed by a device.
There may also be additional caches and buffers in between the device and memory, such as caches associated with bus extenders or bridges. ddi_dma_sync(9F) is provided to synchronize all applicable caches.

ddi_dma_sync( )

If a memory object has multiple mappings--such as for a device (through the DMA handle), and for the CPU--and one mapping is used to modify the memory object, the driver needs to call ddi_dma_sync(9F) to ensure that the modification of the memory object is complete before accessing the object through another mapping. ddi_dma_sync(9F) may also inform other mappings of the object that any cached references to the object are now stale. Additionally, ddi_dma_sync(9F) flushes or invalidates stale cache references as necessary.
Generally, the driver has to call ddi_dma_sync(9F) when a DMA transfer completes. The exception to this is that deallocating the DMA resources (ddi_dma_free(9F)) does an implicit ddi_dma_sync(9F) on behalf of the driver.
int ddi_dma_sync(ddi_dma_handle_t handle, off_t off,
    u_int length, u_int type);

If the object is going to be read by the DMA engine of the device, the device's view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORDEV. If the DMA engine of the device has written to the memory object, and the object is going to be read by the CPU, the CPU's view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORCPU.
Here is an example of synchronizing a DMA object for the CPU:
if (ddi_dma_sync(xsp->handle, 0, length, DDI_DMA_SYNC_FORCPU)
    == DDI_SUCCESS) {
    /* the CPU can now access the transferred data */
    ...
} else {
    error handling
}

If the only mapping that concerns the driver is one for the kernel (such as memory allocated by ddi_mem_alloc(9F)), the flag DDI_DMA_SYNC_FORKERNEL can be used. This is a hint to the system that if it can synchronize the kernel's view faster than the CPU's view, it can do so; otherwise, it acts the same as DDI_DMA_SYNC_FORCPU.

Allocating Private DMA Buffers

Some device drivers may need to allocate memory for DMA transfers to or from a device, in addition to doing transfers requested by user threads and the kernel. Examples of this are setting up shared memory for communication with the device and allocating intermediate transfer buffers. Two interfaces are provided for allocating memory for DMA transfers: ddi_iopb_alloc(9F) and ddi_mem_alloc(9F).

ddi_iopb_alloc()

ddi_iopb_alloc(9F) should be used if the device accesses in a non-sequential fashion, or if synchronization steps using ddi_dma_sync(9F) should be as lightweight as possible (due to frequent use on small objects). This type of access is commonly known as consistent access. I/O parameter blocks that are used for communication between a device and the driver are set up using ddi_iopb_alloc(9F). ddi_iopb_free(9F) is used to free the memory allocated by ddi_iopb_alloc(9F).
On x86 systems, ddi_iopb_alloc(9F) can be used to allocate memory that is physically contiguous as well as consistent.
Code Example 7-5 is an example of how to allocate IOPB memory and the necessary DMA resources to access it. DMA resources must still be allocated, and the DDI_DMA_CONSISTENT flag must be passed to the allocation function.
Code Example 7-5 Using ddi_iopb_alloc(9F)
if (ddi_iopb_alloc(xsp->dip, &limits, size, &xsp->iopb_array)
    != DDI_SUCCESS) {
    error handling
    goto failure;
}
if (ddi_dma_addr_setup(xsp->dip, NULL, xsp->iopb_array, size,
    DDI_DMA_READ | DDI_DMA_CONSISTENT, DDI_DMA_SLEEP,
    NULL, &limits, &xsp->iopb_handle) != DDI_DMA_MAPPED) {
    error handling
    ddi_iopb_free(xsp->iopb_array);
    goto failure;
}

ddi_mem_alloc( )

ddi_mem_alloc(9F) should be used if the device is doing sequential, unidirectional, block-sized and block-aligned transfers to or from memory. This type of access is commonly known as streaming access.
In SPARC, ddi_mem_alloc(9F) obeys the alignment and padding constraints specified by the dlim_minxfer and dlim_burstsizes fields in the passed DMA limit structure to get the most effective hardware support for large transfers. For example, if an I/O transfer can be sped up by using an I/O cache, which at a minimum transfers (flushes) one cache line, ddi_mem_alloc(9F) will round the size to a multiple of the cache line to avoid data corruption.
In x86, ddi_mem_alloc(9F) obeys the alignment specified by the dlim_minxfer fields in the passed DMA limit structure. In addition, the physical address of the allocated memory will be within the dlim_addr_lo and dlims_addr_hi of the DMA limit structure.
ddi_mem_free(9F) is used to free the memory allocated by ddi_mem_alloc(9F).

Note - If the memory is not properly aligned, the transfer will succeed but the system will pick a different (and possibly less efficient) transfer mode that requires less restrictions. For this reason, ddi_mem_alloc(9F) is preferred over kmem_alloc(9F) when allocating memory for the device to access.

Code Example 7-6 is an example of how to allocate memory for streaming access.
Code Example 7-6 Using ddi_mem_alloc(9F)
if (ddi_mem_alloc(xsp->dip, &limits, size, 0,
    &memp, &real_length) != DDI_SUCCESS) {
    error handling
    goto failure;
}
if (ddi_dma_addr_setup(xsp->dip, NULL, memp, real_length,
    DDI_DMA_READ, DDI_DMA_SLEEP, NULL, &limits, &mem_handle)
    != DDI_DMA_MAPPED) {
    error handling

        ddi_mem_free(memp);
        goto failure;
}

ddi_mem_alloc(9F) returns the actual size of the allocated memory object. Because of padding and alignment requirements the actual size might be larger than the requested size. ddi_dma_addr_setup(9F) requires the actual length.

ddi_dma_devalign( )

After allocating DMA resources for private data buffers, ddi_dma_devalign(9F) should be used to determine the minimum required data alignment and minimum effective transfer size.
Although the starting address for the DMA transfer will be aligned properly, the offset passed to ddi_dma_htoc(9F) allows the driver to start a transfer anywhere within the object, eventually bypassing alignment restrictions. The driver should therefore check the alignment restrictions prior to initiating a transfer and align the offset appropriately.
The driver should also check the minimum effective transfer size. The minimum effective transfer size indicates, for writes, how much of the mapped object will be affected by the minimum access. For reads it indicates how much of the mapped object will be accessed.
For memory allocated with ddi_iopb_alloc(9F), the minimum transfer size will usually be one byte. This means that positioning randomly within the mapped object is possible. For memory allocated with ddi_mem_alloc(9F), the minimum transfer size is usually larger as caches might be activated that only operate on entire cache lines (line size granularity).

Example

if (ddi_dma_devalign(xsp->handle, &align, &mineffect) ==
    DDI_FAILURE) {
    error handling
    goto failure;
}
align = max(align, mineffect);
/* adjust offset for ddi_dma_htoc(9F) */