InnerhalbNach weiteren Dokumenten suchenSupport-Ressourcen | Dieses Buch im PDF-Format herunterladen (1004 KB)
Chapter 10 Drivers for Block DevicesThis chapter describes the structure of block device drivers. The kernel views a block device as a set of randomly accessible logical blocks. The file system buffers the data blocks between a block device and the user space using a list of buf(9S) structures. Only block devices can support a file system. For information on writing disk drivers that support SunOS disk commands (such as format(1M)) see Appendix G, Advanced Topics. Block Driver Structure OverviewFigure 10-1 shows data structures and routines that define the structure of a block device driver. Device drivers typically include the following:
Block Driver Device AccessThe shaded device access section in Figure 10-1 illustrates block driver entry points. Note - For a description of character drivers and character driver device access, see Chapter 9, Drivers for Character Devices. Figure 10-1 Block Driver Roadmap
File I/OA file system is a tree-structured hierarchy of directories and files. Some file systems, such as the UNIX File System (UFS), reside on block-oriented devices. File systems are created by mkfs(1M) and newfs(1M). When an application issues a read(2) or write(2) system call to an ordinary file on the UFS file system, the file system may call the device driver strategy(9E) entry point for the block device on which the file resides. The file system code may call strategy(9E) several times for a single read(2) or write(2) system call. It is the file system code that determines the logical device address, or logical block number, for each block and builds a block I/O request in the form of a buf(9S) structure. The driver strategy(9E) entry point then interprets the buf(9S) structure and completes the request. Block Driver Additions to the State StructureThis chapter adds the following fields to the state structure. See "Software State Structure" for more information. int nblocks; /* size of device */ int open; /* flag indicating device is open */ int nlayered; /* count of layered opens */ struct buf *list_head; /* head of transfer request list */ struct buf *list_tail; /* tail of transfer request list */ Entry PointsAssociated with each device driver is a dev_ops(9S) structure, which in turn refers to a cb_ops(9S) structure. See Chapter 5, Autoconfiguration, for details regarding driver data structures. Table 10-1 lists the block driver entry points. Table 10-1 Block Driver Entry Points
Note - Some of the entry points listed in Table 10-1 can be replaced by nodev(9F) or nulldev(9F) as appropriate. Autoconfigurationattach(9E) should perform the common initialization tasks for each instance of a device. Typically, these tasks include:
Block device drivers create minor nodes of type S_IFBLK. This causes a block special file representing the node to eventually appear in the /devices hierarchy. Logical device names for block devices appear in the /dev/dsk directory, and consist of a controller number, bus-address number, disk number, and slice number. These names are created by the disks(1M) program if the node type is set to DDI_NT_BLOCK or DDI_NT_BLOCK_CHAN. DDI_NT_BLOCK_CHAN should be specified if the device communicates on a channel (a bus with an additional level of addressability), such as SCSI disks, and causes a bus-address field (tN) to appear in the logical name. DDI_NT_BLOCK should be used for most other devices. For each minor device (which corresponds to each partition on the disk), the driver must also create an nblocks property. This is an integer property giving the number of blocks supported by the minor device expressed in units of DEV_BSIZE (512 bytes). The file system uses the nblocks property to determine device limits. See "Properties" for details. Example 10-1 shows a typical attach(9E) entry point with emphasis on creating the device's minor node and the nblocks property. Example 10-1 Block Driver attach(9E) Routinestatic int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
switch (cmd) {
case DDI_ATTACH:
allocate a state structure and initialize it
map the devices registers
add the device driver's interrupt handler(s)
initialize any mutexs and condition variables
read label information if the device is a disk
create power manageable components
/*
* Create the device minor node. Note that the node_type
* argument is set to DDI_NT_BLOCK.
*/
if (ddi_create_minor_node(dip, "minor_name", S_IFBLK,
minor_number, DDI_NT_BLOCK, 0) == DDI_FAILURE) {
free resources allocated so far
/* Remove any previously allocated minor nodes */
ddi_remove_minor_node(dip, NULL);
return (DDI_FAILURE);
}
/*
* Create driver properties like "nblocks". If the device
* is a disk, the nblocks property is usually calculated from
* information in the disk label.
*/
xsp->nblocks = size of device in 512 byte blocks;
if (ddi_prop_update_int(makedevice(DDI_MAJOR_T_UNKNOWN,
instance), dip, "nblocks", xsp->nblocks)
!= DDI_PROP_SUCCESS) {
cmn_err(CE_CONT, "%s: cannot create nblocks property\n",
ddi_get_name(dip));
free resources allocated so far
return (DDI_FAILURE);
}
xsp->open = 0;
xsp->nlayered = 0;
...
return (DDI_SUCCESS);
case DDI_PM_RESUME:
For information, see Chapter 8, Power Management case DDI_RESUME:
For information, see Chapter 8, Power Management default:
return (DDI_FAILURE);
}
}
Properties are associated with device numbers. In Example 10-1, attach(9E) builds a device number using makedevice(9F). At this point, however, only the minor number component of the device number is known, so it must use the special major number DDI_MAJOR_T_UNKNOWN to build the device number. Controlling Device AccessThis section describes aspects of the open(9E) and close(9E) entry points that are specific to block device drivers. See Chapter 9, Drivers for Character Devices, for more information on open(9E) and close(9E). open(9E)int xxopen(dev_t *devp, int flag, int otyp, cred_t *credp) The open(9E) entry point is used to gain access to a given device. The open(9E) routine of a block driver is called when a user thread issues an open(2) or mount(2) system call on a block special file associated with the minor device, or when a layered driver calls open(9E). See "File I/O" for more information. The open(9E) entry point should check for the following:
Example 10-2 demonstrates a block driver open(9E) entry point. Example 10-2 Block Driver open(9E) Routinestatic int
xxopen(dev_t *devp, int flags, int otyp, cred_t *credp)
{
minor_t instance;
struct xxstate *xsp;
instance = getminor(*devp);
xsp = ddi_get_soft_state(statep, instance);
if (xsp == NULL)
return (ENXIO);
mutex_enter(&xsp->mu);
/*
* only honor FEXCL. If a regular open or a layered open
* is still outstanding on the device, the exclusive open
* must fail.
*/
if ((flags & FEXCL) && (xsp->open || xsp->nlayered)) {
mutex_exit(&xsp->mu);
return (EAGAIN);
}
switch (otyp) {
case OTYP_LYR:
xsp->nlayered++;
break;
case OTYP_BLK:
xsp->open = 1;
break;
default:
mutex_exit(&xsp->mu);
return (EINVAL);
}
mutex_exit(&xsp->mu);
return (0);
}
The otyp argument is used to specify the type of open on the device. OTYP_BLK is the typical open type for a block device. A device may be opened several times with otyp set to OTYP_BLK, although close(9E) will be called only once when the final close of type OTYP_BLK has occurred for the device. otyp is set to OTYP_LYR if the device is being used as a layered device. For every open of type OTYP_LYR, the layering driver issues a corresponding close of type OTYP_LYR. The example keeps track of each type of open so the driver can determine when the device is not being used in close(9E). See the open(9E) manual page for more details about the otyp argument. close(9E)int xxclose(dev_t dev, int flag, int otyp, cred_t *credp) The arguments of the close(9E) entry point are identical to arguments of open(9E), except that dev is the device number, as opposed to a pointer to the device number. The close(9E) routine should verify otyp in the same way as was described for the open(9E) entry point. In Example 10-3, close(9E) must determine when the device can really be closed based on the number of block opens and layered opens. Example 10-3 Block Device close(9E) Routinestatic int
xxclose(dev_t dev, int flag, int otyp, cred_t *credp)
{
minor_t instance;
struct xxstate *xsp;
instance = getminor(dev);
xsp = ddi_get_soft_state(statep, instance);
if (xsp == NULL)
return (ENXIO);
mutex_enter(&xsp->mu);
switch (otyp) {
case OTYP_LYR:
xsp->nlayered--;
break;
case OTYP_BLK:
xsp->open = 0;
break;
default:
mutex_exit(&xsp->mu);
return (EINVAL);
}
if (xsp->open || xsp->nlayered) {
/* not done yet */
mutex_exit(&xsp->mu);
return (0);
}
/* cleanup (rewind tape, free memory, etc.) */
/* wait for I/O to drain */
mutex_exit(&xsp->mu);
return (0);
}
Data TransfersMost block drivers use the strategy(9F) entry point to transfer data. strategy(9E)int xxstrategy(struct buf *bp) The strategy(9E) entry point is used to read and write data buffers to and from a block device. The name strategy refers to the fact that this entry point may implement some optimal strategy for ordering requests to the device. strategy(9E) can be written to process one request at a time (synchronous transfer), or to queue multiple requests to the device (asynchronous transfer). When choosing a method, the abilities and limitations of the device should be taken into account. The strategy(9E) routine is passed a pointer to a buf(9S) structure. This structure describes the transfer request, and contains status information on return. buf(9S) and strategy(9E) are the focus of block device operations. buf StructureThe following buf structure members are important to block drivers: int b_flags; /* Buffer Status */
struct buf *av_forw; /* Driver work list link */
struct buf *av_back; /* Driver work lists link */
size_t b_bcount; /* # of bytes to transfer */
union {
caddr_t b_addr; /* Buffer's virtual address */
} b_un;
daddr_t b_blkno; /* Block number on device */
diskaddr_t b_lblkno; /* Expanded block number on device */
size_t b_resid; /* # of bytes not transferred */
/* after error */
int b_error; /* Expanded error field */
void *b_private; /* "opaque" driver private area */
dev_t b_edev; /* expanded dev field */
b_flags contains status and transfer attributes of the buf structure. If B_READ is set, the buf structure indicates a transfer from the device to memory, otherwise it indicates a transfer from memory to the device. If the driver encounters an error during data transfer, it should set the B_ERROR field in the b_flags member and provide a more specific error value in b_error. Drivers should use bioerror(9F) rather than setting B_ERROR. Drivers should never clear b_flags. av_forw and av_back are pointers that the driver can use to manage a list of buffers by the driver. See "Asynchronous Data Transfers" for a discussion of the av_forw and av_back pointers. b_bcount specifies the number of bytes to be transferred by the device. b_un.b_addr is the kernel virtual address of the data buffer. b_blkno is the starting 32-bit logical block number on the device for the data transfer, expressed in DEV_BSIZE (512 bytes) units. The driver should use either b_blkno or b_lblkno, but not both. b_lblkno is the starting 64-bit logical block number on the device for the data transfer, expressed in DEV_BSIZE (512 bytes) units. The driver should use either b_blkno or b_lblkno, but not both. b_resid is set by the driver to indicate the number of bytes that were not transferred because of an error. See Example 10-8for an example of setting b_resid. The b_resid member is overloaded: it is also used by disksort(9F). b_error is set to an error number by the driver when a transfer error occurs. It is set in conjunction with the b_flags B_ERROR bit. See Intro(9E) for details regarding error values. Drivers should use bioerror(9F) rather than setting b_error directly. b_private is for exclusive use by the driver to store driver-private data. b_edev contains the device number of the device involved in the transfer. bp_mapin(9F)When a buf structure pointer is passed into the device driver's strategy(9E) routine, the data buffer referred to by b_un.b_addr is not necessarily mapped in the kernel's address space. This means that the driver cannot directly access the data. Most block-oriented devices have DMA capability, and therefore do not need to access the data buffer directly. Instead, they use the DMA mapping routines to allow the device's DMA engine to do the data transfer. For details about using DMA, see Chapter 7, DMA. If a driver needs to directly access the data buffer (as opposed to having the device access the data), it must first map the buffer into the kernel's address space using bp_mapin(9F). bp_mapout(9F) should be used when the driver no longer needs to access the data directly. bp_mapout(9F) should only be called on buffers which have been allocated and are owned by the device driver. It must not be called on buffers passed to the driver through the strategy(9E) entry point (for example a filesystem). Because bp_mapin(9F) does not keep a reference count, bp_mapout(9F) will remove any kernel mapping that a layer above the device driver might rely on. Synchronous Data TransfersThis section presents a simple method for performing synchronous I/O transfers. It assumes that the hardware is a simple disk device that can transfer only one data buffer at a time using DMA, and that the disk can be spun up and spun down by software command. The device driver's strategy(9E) routine waits for the current request to be completed before accepting a new one. The device interrupts when the transfer is complete or when an error occurs.
Asynchronous Data TransfersThis section presents a method for performing asynchronous I/O transfers. The driver queues the I/O requests and then returns control to the caller. Again, the assumption is that the hardware is a simple disk device that allows one transfer at a time. The device interrupts when a data transfer has completed or when an error occurs.
Miscellaneous Entry Pointsdump(9E)The dump(9E) entry point is used to copy a portion of virtual address space directly to the specified device in the case of a system failure. It is also used to copy the state of the kernel out to disk during a checkpoint operation (see cpr(7), dump(9E)). It must be capable of performing this operation without the use of interrupts, since they are disabled during the checkpoint operation. int xxdump(dev_t dev, caddr_t addr, daddr_t blkno, int nblk) dev is the device number of the device to dump to, addr is the base kernel virtual address at which to start the dump, blkno is the first block to dump to, and nblk is the number of blocks to dump. The dump depends upon the existing driver working properly. See for more information. print(9E)int xxprint(dev_t dev, char *str) The print(9E) entry point is called by the system to display a message about an exception it has detected. print(9E) should call cmn_err(9F) to post the message to the console on behalf of the system. Here is an example: static int
xxprint(dev_t dev, char *str)
{
cmn_err(CE_CONT, "xx: %s\n", str);
return (0);
}
|
||||||||||||||||||||||||||||||||