包含在
尋找其他文件熱門支援資源 | 以 PDF 格式下載這本書 (1363 KB)
Chapter 11 Drivers for Block DevicesThis chapter describes the structure of block device drivers. The kernel views a block device as a set of randomly accessible logical blocks. The file system buffers the data blocks between a block device and the user space using a list of buf(9S) structures. Only block devices can support a file system. This chapter provides information on the following subjects: Block Driver Structure OverviewFigure 11–1 shows data structures and routines that define the structure of a block device driver. Device drivers typically include the following:
The shaded device access section in Figure 11–1 illustrates block driver entry points. Figure 11–1 Block Driver Roadmap
Associated with each device driver is a dev_ops(9S) structure, which in turn refers to a cb_ops(9S) structure. See Chapter 5, Driver Autoconfiguration, for details regarding driver data structures. Note – Some of the entry points can be replaced by nodev(9F) or nulldev(9F) as appropriate. File I/OA file system is a tree-structured hierarchy of directories and files. Some file systems, such as the UNIX File System (UFS), reside on block-oriented devices. File systems are created by format(1M) and newfs(1M). When an application issues a read(2) or write(2) system call to an ordinary file on the UFS file system, the file system can call the device driver strategy(9E) entry point for the block device on which the file system resides. The file system code can call strategy(9E) several times for a single read(2) or write(2) system call. The file system code determines the logical device address, or logical block number, for each ordinary file block and builds a block I/O request in the form of a buf(9S) structure directed at the block device. The driver strategy(9E) entry point then interprets the buf(9S) structure and completes the request. Block Device Autoconfigurationattach(9E) should perform the common initialization tasks for each instance of a device. Typically, these tasks include:
Block device drivers create minor nodes of type S_IFBLK. This causes a block special file representing the node to eventually appear in the /devices hierarchy. Logical device names for block devices appear in the /dev/dsk directory, and consist of a controller number, bus-address number, disk number, and slice number. These names are created by the devfsadm(1M) program if the node type is set to DDI_NT_BLOCK or DDI_NT_BLOCK_CHAN. DDI_NT_BLOCK_CHAN should be specified if the device communicates on a channel (a bus with an additional level of addressability), such as SCSI disks, and causes a bus-address field (tN) to appear in the logical name. DDI_NT_BLOCK should be used for most other devices. For each minor device (which corresponds to each partition on the disk), the driver must also create an nblocks or Nblocks property. This is an integer property giving the number of blocks supported by the minor device expressed in units of DEV_BSIZE (512 bytes). The file system uses the nblocks and Nblocks properties to determine device limits; Nblocks is the 64–bit version of nblocks and should be used with storage devices with over 1 Tbyte of storage per disk.). See Device Properties for more information. Example 11–1 shows a typical attach(9E) entry point with emphasis on creating the device's minor node and the Nblocks property. Note that because this example uses Nblocks and not nblocks, it calls ddi_prop_update_int64(9F) instead of ddi_prop_update_int(9F). As a side note, this example shows the use of makedevice(9F) to create a device number for ddi_prop_update_int64(9F). makedevice(9F) itself makes use of ddi_driver_major(9F), which generates a major number from a pointer to a dev_info_t structure, just as getmajor(9F) does with a dev_t structure pointer. Example 11–1 Block Driver attach(9E) Routinestatic int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
int instance = ddi_get_instance(dip);
switch (cmd) {
case DDI_ATTACH:
allocate a state structure and initialize it
map the devices registers
add the device driver's interrupt handler(s)
initialize any mutexes and condition variables
read label information if the device is a disk
create power manageable components
/*
* Create the device minor node. Note that the node_type
* argument is set to DDI_NT_BLOCK.
*/
if (ddi_create_minor_node(dip, "minor_name", S_IFBLK,
instance, DDI_NT_BLOCK, 0) == DDI_FAILURE) {
free resources allocated so far
/* Remove any previously allocated minor nodes */
ddi_remove_minor_node(dip, NULL);
return (DDI_FAILURE);
}
/*
* Create driver properties like "Nblocks". If the device
* is a disk, the Nblocks property is usually calculated from
* information in the disk label. Use "Nblocks" instead of
* "nblocks" to ensure the property works for large disks.
*/
xsp->Nblocks = size of device in 512 byte blocks;
maj_number = ddi_driver_major(dip);
if (ddi_prop_update_int64(makedevice(maj_number, instance), dip,
"Nblocks", xsp->Nblocks) != DDI_PROP_SUCCESS) {
cmn_err(CE_CONT, "%s: cannot create Nblocks property\n",
ddi_get_name(dip));
free resources allocated so far
return (DDI_FAILURE);
}
xsp->open = 0;
xsp->nlayered = 0;
...
return (DDI_SUCCESS);
case DDI_RESUME:
For information, see Chapter 9, Power Management
default:
return (DDI_FAILURE);
}
}
Controlling Device AccessThis section describes aspects of the open() and close() entry points that are specific to block device drivers. See Chapter 10, Drivers for Character Devices for more information on open(9E) and close(9E). open() Entry Point (Block Drivers)The open(9E) entry point is used to gain access to a given device. The open(9E) routine of a block driver is called when a user thread issues an open(2) or mount(2) system call on a block special file associated with the minor device, or when a layered driver calls open(9E). See File I/O for more information. The open(9E) entry point should check for the following:
Example 11–2 demonstrates a block driver open(9E) entry point. Example 11–2 Block Driver open(9E) Routinestatic int
xxopen(dev_t *devp, int flags, int otyp, cred_t *credp)
{
minor_t instance;
struct xxstate *xsp;
instance = getminor(*devp);
xsp = ddi_get_soft_state(statep, instance);
if (xsp == NULL)
return (ENXIO);
mutex_enter(&xsp->mu);
/*
* only honor FEXCL. If a regular open or a layered open
* is still outstanding on the device, the exclusive open
* must fail.
*/
if ((flags & FEXCL) && (xsp->open || xsp->nlayered)) {
mutex_exit(&xsp->mu);
return (EAGAIN);
}
switch (otyp) {
case OTYP_LYR:
xsp->nlayered++;
break;
case OTYP_BLK:
xsp->open = 1;
break;
default:
mutex_exit(&xsp->mu);
return (EINVAL);
}
mutex_exit(&xsp->mu);
return (0);
}
The otyp argument is used to specify the type of open on the device. OTYP_BLK is the typical open type for a block device. A device can be opened several times with otyp set to OTYP_BLK, although close(9E) will be called only once when the final close of type OTYP_BLK has occurred for the device. otyp is set to OTYP_LYR if the device is being used as a layered device. For every open of type OTYP_LYR, the layering driver issues a corresponding close of type OTYP_LYR. The example keeps track of each type of open so the driver can determine when the device is not being used in close(9E). close() Entry Point (Block Drivers)The arguments of the close(9E) entry point are identical to arguments of open(9E), except that dev is the device number, as opposed to a pointer to the device number. The close(9E) routine should verify otyp in the same way as was described for the open(9E) entry point. In Example 11–3, close(9E) must determine when the device can really be closed based on the number of block opens and layered opens. Example 11–3 Block Device close(9E) Routinestatic int
xxclose(dev_t dev, int flag, int otyp, cred_t *credp)
{
minor_t instance;
struct xxstate *xsp;
instance = getminor(dev);
xsp = ddi_get_soft_state(statep, instance);
if (xsp == NULL)
return (ENXIO);
mutex_enter(&xsp->mu);
switch (otyp) {
case OTYP_LYR:
xsp->nlayered--;
break;
case OTYP_BLK:
xsp->open = 0;
break;
default:
mutex_exit(&xsp->mu);
return (EINVAL);
}
if (xsp->open || xsp->nlayered) {
/* not done yet */
mutex_exit(&xsp->mu);
return (0);
}
/* cleanup (rewind tape, free memory, etc.) */
/* wait for I/O to drain */
mutex_exit(&xsp->mu);
return (0);
}
strategy() Entry PointThe strategy(9E) entry point is used to read and write data buffers to and from a block device. The name strategy refers to the fact that this entry point might implement some optimal strategy for ordering requests to the device. strategy(9E) can be written to process one request at a time (synchronous transfer), or to queue multiple requests to the device (asynchronous transfer). When choosing a method, the abilities and limitations of the device should be taken into account. The strategy(9E) routine is passed a pointer to a buf(9S) structure. This structure describes the transfer request, and contains status information on return. buf(9S) and strategy(9E) are the focus of block device operations. buf StructureThe following buf structure members are important to block drivers: int b_flags; /* Buffer Status */
struct buf *av_forw; /* Driver work list link */
struct buf *av_back; /* Driver work lists link */
size_t b_bcount; /* # of bytes to transfer */
union {
caddr_t b_addr; /* Buffer's virtual address */
} b_un;
daddr_t b_blkno; /* Block number on device */
diskaddr_t b_lblkno; /* Expanded block number on device */
size_t b_resid; /* # of bytes not transferred */
/* after error */
int b_error; /* Expanded error field */
void *b_private; /* “opaque” driver private area */
dev_t b_edev; /* expanded dev field */
b_flags contains status and transfer attributes of the buf structure. If B_READ is set, the buf structure indicates a transfer from the device to memory; otherwise, it indicates a transfer from memory to the device. If the driver encounters an error during data transfer, it should set the B_ERROR field in the b_flags member and provide a more specific error value in b_error. Drivers should use bioerror(9F) rather than setting B_ERROR. Drivers should never clear b_flags.
bp_mapin StructureWhen a buf structure pointer is passed into the device driver's strategy(9E) routine, the data buffer referred to by b_un.b_addr is not necessarily mapped in the kernel's address space. This means that the driver cannot directly access the data. Most block-oriented devices have DMA capability, and therefore do not need to access the data buffer directly. Instead, they use the DMA mapping routines to allow the device's DMA engine to do the data transfer. For details about using DMA, see Chapter 8, Direct Memory Access (DMA). If a driver needs to directly access the data buffer (as opposed to having the device access the data), it must first map the buffer into the kernel's address space using bp_mapin(9F). bp_mapout(9F) should be used when the driver no longer needs to access the data directly. bp_mapout(9F) should only be called on buffers that have been allocated and are owned by the device driver. It must not be called on buffers passed to the driver through the strategy(9E) entry point (for example a file system). Because bp_mapin(9F) does not keep a reference count, bp_mapout(9F) will remove any kernel mapping that a layer above the device driver might rely on. Synchronous Data Transfers (Block Drivers)This section presents a simple method for performing synchronous I/O transfers. It assumes that the hardware is a simple disk device that can transfer only one data buffer at a time using DMA, and that the disk can be spun up and spun down by software command. The device driver's strategy(9E) routine waits for the current request to be completed before accepting a new one. The device interrupts when the transfer is complete or when an error occurs.
Asynchronous Data Transfers (Block Drivers)This section presents a method for performing asynchronous I/O transfers. The driver queues the I/O requests and then returns control to the caller. Again, the assumption is that the hardware is a simple disk device that allows one transfer at a time. The device interrupts when a data transfer has completed or when an error occurs.
Miscellaneous Entry PointsThis section discusses the dump(9E) and print(9E) entry points. dump() Entry Point (Block Drivers)The dump(9E) entry point is used to copy a portion of virtual address space directly to the specified device in the case of a system failure. It is also used to copy the state of the kernel out to disk during a checkpoint operation (see the cpr(7) and dump(9E) man pages). It must be capable of performing this operation without the use of interrupts, since they are disabled during the checkpoint operation. int dump(dev_t dev, caddr_t addr, daddr_t blkno, int nblk) dev is the device number of the device to dump to, addr is the base kernel virtual address at which to start the dump, blkno is the first block to dump to, and nblk is the number of blocks to dump. The dump depends upon the existing driver working properly. print() Entry Point (Block Drivers)int print(dev_t dev, char *str) The print(9E) entry point is called by the system to display a message about an exception it has detected. print(9E) should call cmn_err(9F) to post the message to the console on behalf of the system. Here is an example: static int
xxprint(dev_t dev, char *str)
{
cmn_err(CE_CONT, “xx: %s\n”, str);
return (0);
}
Disk Device DriversDisk devices represent an important class of block device drivers. Disk ioctlsSolaris disk drivers need to support a minimum set of ioctl commands specific to Solaris disk drivers. These I/O controls are specified in the dkio(7) manual page. Disk I/O controls transfer disk information to or from the device driver. A Solaris disk device is one that is supported by disk utility commands such as format(1M) and newfs(1M). Table 11–1 lists the mandatory Sun disk I/O controls. Table 11–1 Mandatory Solaris Disk ioctls
Disk PerformanceThe Solaris DDI/DKI provides facilities to optimize I/O transfers for improved file system performance. It supports a mechanism to manage the list of I/O requests so as to optimize disk access for a file system. See Asynchronous Data Transfers (Block Drivers) for a description of enqueuing an I/O request. The diskhd structure is used to manage a linked list of I/O requests. struct diskhd {
long b_flags; /* not used, needed for consistency*/
struct buf *b_forw, *b_back; /* queue of unit queues */
struct buf *av_forw, *av_back; /* queue of bufs for this unit */
long b_bcount; /* active flag */
};
The diskhd data structure has two buf pointers that the driver can manipulate. The av_forw pointer points to the first active I/O request. The second pointer, av_back, points to the last active request on the list. A pointer to this structure is passed as an argument to disksort(9F), along with a pointer to the current buf structure being processed. The disksort(9F) routine is used to sort the buf requests in a fashion that optimizes disk seek and then inserts the buf pointer into the diskhd list. The disksort(9F) program uses the value that is in b_resid of the buf structure as a sort key. The driver is responsible for setting this value. Most Sun disk drivers use the cylinder group as the sort key. This tends to optimize the file system read-ahead accesses. Once data has been added to the diskhd list, the device needs to transfer the data. If the device is not busy processing a request, the xxstart() routine pulls the first buf structure off the diskhd list and starts a transfer. If the device is busy, the driver should return from the xxstrategy() entry point. Once the hardware is done with the data transfer, it generates an interrupt. The driver's interrupt routine is then called to service the device. After servicing the interrupt, the driver can then call the start() routine to process the next buf structure in the diskhd list. |
||||||||||||||||