-
Use a prefix
based on the name of your driver to give global variables and functions unique
names.
The name of each function, data element, and driver preprocessor
definition must be unique for each driver.
A driver module is linked into the kernel. The name of each symbol unique
to a particular driver must not collide with other kernel symbols. To avoid
such collisions, each function and data element for a particular driver must
be named with a prefix common to that driver. The prefix must be sufficient
to uniquely name each driver symbol. Typically, this prefix is the name of
the driver or an abbreviation for the name of the driver. For example, xx_open() would be the name of the open(9E) routine of driver xx.
When building a driver, a driver must necessarily include a number of
system header files. The globally-visible names within these header files
cannot be predicted. To avoid collisions with these names, each driver preprocessor
definition must be given a unique name by using an identifying prefix.
A distinguishing driver symbol prefix also is an aid to deciphering
system logs and panics when troubleshooting. Instead of seeing an error related
to an ambiguous attach() function, you see an error message
about xx_attach().
-
If
you are basing your design on an existing driver, modify the configuration
file before adding the driver.
The -n option in
the add_drv(1M) command
enables you to update the system configuration files for a driver without
loading or attaching the driver.
-
Use the cmn_err() function
to log driver activity.
You can use the cmn_err(9F) function
to display information from your driver similar to the way you might use print
statements to display information from a user program. The cmn_err(9F) function
writes low priority messages to /dev/log. The syslogd(1M) daemon reads
messages from /dev/log and writes low priority messages
to /var/adm/messages. Use the following command to monitor
the output from your cmn_err(9F) messages:
% tail -f /var/adm/messages
|
Be sure to remove cmn_err() calls that are used for
development or debugging before you compile your production version driver.
You might want to use cmn_err() calls in a production driver
to write error messages that would be useful to a system administrator.
-
Clean up allocations and other initialization
activities when the driver exits.
When the driver exits, whether
intentionally or prematurely, you need to perform such tasks as closing opened
files, freeing allocated memory, releasing mutex locks, and destroying any
mutexes that have been created. In addition, the system must be able to close
all minor devices and detach driver instances even after the hardware fails.
An orderly approach is to reverse _init() actions in the _fini() routine, reverse open() operations in
the close() routine, and reverse attach() operations
in the detach() routine.
-
Use ASSERT(9F) to catch unexpected error
returns.
ASSERT() is a macro that halts the
kernel execution if a condition that was expected to be true turns out to
be false. To activate ASSERT(), you need to include the sys/debug.h header file and specify the DEBUG preprocessor
symbol during compilation.
-
Use mutex_owned() to validate and document locking requirements.
The mutex_owned(9F) function helps determine whether the current thread
owns a specified mutex. To determine whether a mutex is held by a thread,
use mutex_owned() within ASSERT().
-
Use conditional compilation to toggle “costly”
debugging features.
The Solaris OS provides various debugging
functions, such as ASSERT() and mutex-owned(),
that can be turned on by specifying the DEBUG preprocessor
symbol when the driver is compiled. With conditional compilation, unnecessary
code can be removed from the production driver. This approach can also be
accomplished by using a global variable.
-
Use a separate instance of the driver for each device to be
controlled.
-
Use DDI functions as much as possible in your device drivers.
These interfaces shield the driver from platform-specific dependencies
such as mismatches between processor and device endianness and any other data
order dependencies. With these interfaces, a single-source driver can run
on the SPARC platform, x86 platform, and related processor architectures.
-
Anticipate corrupted data.
Always check that the
integrity of data before that data is used. The driver must avoid releasing
bad data to the rest of the system.
-
A device should only write to DMA buffers that are controlled
solely by the driver.
This technique prevents a DMA fault from
corrupting an arbitrary part of the system's main memory.
-
Use the ddi_umem_alloc(9F) function when you need to
make DMA transfers.
This function guarantees that only whole,
aligned pages are transferred.
-
Set a fixed number of attempts before taking
alternate action to deal with a stuck interrupt.
The device driver
must not be an unlimited drain on system resources if the device locks up.
The driver should time out if a device claims to be continuously busy. The
driver should also detect a pathological (stuck) interrupt request and take
appropriate action.
-
Use care when setting the sequence for mutex acquisitions
and releases so as to avoid unwanted thread interactions if a device fails.
See Thread Interaction in Writing Device Drivers for more information.
-
Check for malformed ioctl() requests from user applications.
User requests
can be destructive. The design of the driver should take into consideration
the construction of each type of potential ioctl() request.
-
Try to avoid situations where a driver continues to function
without detecting a device failure.
A driver should switch to
an alternative device rather than try to work around a device failure.
-
All device
drivers in the Solaris OS must support hotplugging.
All devices
need to be able to be installed or removed without requiring a reboot of the
system.
-
All
device drivers should support power management.
Power management
provides the ability to control and manage the electrical power usage of a
computer system or device. Power management enables systems to conserve energy
by using less power when idle and by shutting down completely when not in
use.
-
Apply the volatile keyword to any variable
that references a device register.
Without the volatile keyword,
the compile-time optimizer can delete important accesses to a register.
-
Perform periodic health checks to detect and report faulty
devices.
A periodic health check should include the following activities:
-
Check any register or memory location on the device whose
value might have been altered since the last poll.
-
Timestamp outgoing requests such as transmit blocks or commands
that are issued by the driver.
-
Initiate a test action on the device that should be completed
before the next scheduled check.
Testing a device driver can cause the system to panic and can harm the
kernel.
-
Install the driver in a temporary location.
Install drivers in the /tmp directory until you
are finished modifying and testing the _info(), _init(), and attach() routines. Copy the driver binary
to the /tmp directory. Link to the driver from the kernel
driver directory.
If a driver has an error in its _info(), _init(), or attach() function, your machine could get
into a state of infinite panic. The Solaris OS automatically reboots itself
after a panic. The Solaris OS loads any drivers it can during boot. If you
have an error in your attach() function that panics the
system when you load the driver, then the system will panic again when it
tries to reboot after the panic. The system will continue the cycle of panic,
reboot, panic as it attempts to reload the faulty driver every time it reboots
after panic.
To avoid an infinite panic, keep the driver in the /tmp area
until it is well tested. Link to the driver in the /tmp area
from the kernel driver area. The Solaris OS removes all files from the /tmp area every time the system reboots. If your driver causes
a panic, the Solaris OS reboots successfully because the driver has been removed
automatically from the /tmp area. The link in the kernel
driver area points to nothing. The faulty driver did not get loaded, so the
system does not go back into a panic. You can modify the driver, copy it again
to the /tmp area, and continue testing and developing.
When the driver is well tested, copy it to the /usr/kernel/drv area
so that it will remain available after a reboot.
The following example shows you where to link the driver for a 32-bit
platform. For other architectures, see the instructions in Installing a Driver.
# cp mydriver /tmp
# ln -s /tmp/mydriver /usr/kernel/drv/mydriver
|
-
Enable
the deadman feature to avoid a hard hang.
If your system is in
a hard hang, then you cannot break into the debugger. If you enable the deadman
feature, the system panics instead of hanging indefinitely. You can then use
the kmdb(1) kernel
debugger to analyze your problem.
The deadman feature checks every second whether the system clock is
updating. If the system clock is not updating, then you are in an indefinite
hang. If the system clock has not been updated for 50 seconds, the deadman
feature induces a panic and puts you in the debugger.
Take the following steps to enable the deadman feature:
-
Make sure you are capturing crash images with dumpadm(1M).
-
Set the snooping variable in the /etc/system file.
set snooping=1
-
Reboot the system so that the /etc/system file
is read again and the snooping setting takes effect.
Note that any zones on your system inherit the deadman
setting as well.
If your system hangs while the deadman feature is enabled, you should
see output similar to the following example on your console:
panic[cpu1]/thread=30018dd6cc0: deadman: timed out after 9 seconds of
clock inactivity
panic: entering debugger (continue to save dump)
Inside the debugger, use the ::cpuinfo command to
investigate why the clock interrupt was not able to fire and advance the system
time.
-
Use a serial connection to control your
test machine from a separate host system.
This technique is explained
in Testing With a Serial Connection in Writing Device Drivers.
-
Use an alternate kernel.
Booting
from a copy of the kernel and the associated binaries rather than from the
default kernel avoids inadvertently rendering the system inoperable.
-
Use an additional kernel module to experiment
with different kernel variable settings.
This approach isolates
experiments with the kernel variable settings. See Setting Up Test Modules in Writing Device Drivers.
-
Make contingency plans for potential
data loss on a test system.
If your test system is set up as a
client of a server, then you can boot from the network if problems occur.
You could also create a special partition to hold a copy of a bootable root
file system. See Avoiding Data Loss on a Test System in Writing Device Drivers.
-
Capture system crash dumps
if your test system panics.
-
Use fsck(1M) to
repair the damaged root file system temporarily if your system crashes during
the attach(9E) process
so that any crash dumps can be salvaged. See Recovering the Device Directory in Writing Device Drivers.
-
Install drivers in the /tmp directory
until you are finished modifying and testing the _info(), _init(), and attach() routines.
Keep
a driver in the /tmp directory until the driver has been
well tested. If a panic occurs, the driver will be removed from /tmp directory
and the system will reboot successfully.
The Solaris OS provides various tools for debugging and tuning your
device driver:
-
You might receive the following warning message from the add_drv(1M) command:
Warning: Driver (driver_name) successfully added to system but failed to attach
This message might have one of the following causes:
-
The hardware has not been detected properly. The system cannot
find the device.
-
The configuration file is missing. See Writing a Configuration File for information on when you need a configuration
file and what information goes into a configuration file. Be sure to put the
configuration file in /kernel/drv or /usr/kernel/drv and not in the driver directory.
-
Use the kmdb(1) kernel
debugger for runtime debugging.
The kmdb debugger
provides typical runtime debugger facilities, such as breakpoints, watch points,
and single-stepping. For more information, see Solaris Modular Debugger Guide.
-
Use the mdb(1) modular
debugger for postmortem debugging.
Postmortem debugging is performed
on a system crash dump rather than on a live system. With postmortem debugging,
the same crash dump can be analyzed by different people or processes simultaneously.
In addition, mdb enables you to create special macros called dmods to perform rigorous analysis on the dump. For more information,
see Solaris Modular Debugger Guide.
-
Use
the kstat(3KSTAT) facility
to export module-specific kernel statistics for your device driver.
-
Use
the DTrace facility to add instrumentation to your driver dynamically so that
you can perform tasks such as analyzing the system and measuring performance.
For information on DTrace, see the Solaris Dynamic Tracing Guide and the DTrace User Guide.
-
If your driver does not behave as expected on a 64-bit platform,
make sure you are using a 64-bit driver. By default, compilation on the Solaris
OS yields a 32-bit result on every architecture. To obtain a 64-bit result,
follow the instructions in Building a Driver.
Use the file(1) command
to determine whether you have a 64-bit driver.
% file qotd_3
qotd_3: ELF 32-bit LSB relocatable 80386 Version 1
|
-
If you are using a 64-bit system and you are not certain whether
you are currently running the 64-bit kernel or the 32-bit kernel, use the -k option of the isainfo(1) command.
The -v option reports all instruction set architectures of
the system. The -k option reports the instruction set architecture
that is currently in use.
% isainfo -v
64-bit sparcv9 applications
vis2 vis
32-bit sparc applications
vis2 vis v8plus div32 mul32
% isainfo -kv
64-bit sparcv9 kernel modules
|
-
If your driver seems to have an error in a function that you
did not write, make sure you have called that function with the correct arguments
and specified the correct include files. Many kernel functions have the same
names as system calls and user functions. For example, read() and write() can be system calls, user library functions, or kernel functions.
Similarly, ioctl() and mmap() can be
system calls or kernel functions. The man mmap command
displays the mmap(2) man page. To see the arguments, description,
and include files for the kernel function, use the man mmap.9e command.
If you do not know whether the function you want is in section 9E or section
9F, use the man -l mmap command, for example.