Contained WithinFind More DocumentationFeatured Support Resources | Download this book in PDF (2009 KB)
Chapter 37 Postmortem TracingThis chapter describes the DTrace facilities for postmortem extraction and processing of the in-kernel data of DTrace consumers. In the event of a system crash, the information that has been recorded with DTrace may provide the crucial clues to root-cause the system failure. DTrace data may be extracted and processed from the system crash dump to aid you in understanding fatal system failures. By coupling these postmortem capabilities of DTrace with its ring buffering buffer policy (see Chapter 11, Buffers and Buffering), DTrace can be used as an operating system analog to the black box flight data recorder present on commercial aircraft. To extract DTrace data from a specific crash dump, you should begin by running the Solaris Modular Debugger, mdb(1), on the crash dump of interest. The MDB module containing the DTrace functionality will be loaded automatically. To learn more about MDB, refer to the Solaris Modular Debugger Guide. Displaying DTrace ConsumersTo extract DTrace data from a DTrace consumer, you must first determine the DTrace consumer of interest by running the ::dtrace_state MDB dcmd:
This command displays a table of DTrace state structures. Each row of the table consists of the following information:
To obtain further information about a specific DTrace consumer, specify the address of its process structure to the ::ps dcmd:
Displaying Trace DataOnce you determine the consumer of interest, you can retrieve the data corresponding to any unconsumed buffers by specifying the address of the state structure to the ::dtrace dcmd. The following example shows the output of the ::dtrace dcmd on an anonymous enabling of syscall:::entry with the action trace(execname):
The ::dtrace dcmd handles errors in the same way that dtrace(1M) does: if drops, errors, speculative drops, or the like were encountered while the consumer was executing, ::dtrace will emit a message corresponding to the dtrace(1M)message. The order of events as displayed by ::dtrace is always oldest to youngest within a given CPU. The CPU buffers themselves are displayed in numerical order. If an ordering is required for events on different CPUs, trace the timestamp variable. You can display only the data for a specific CPU by specifying the -c option to ::dtrace:
Notice that ::dtrace only processes in-kernel DTrace data. Data that has been consumed from the kernel and processed (through dtrace(1M) or other means) will not be available to be processed with ::dtrace. To assure that the most amount of data possible is available at the time of failure, use a ring buffer buffering policy. See Chapter 11, Buffers and Buffering for more information on buffer policies. The following example creates a very small (16K) ring buffer and records all system calls and the process making them:
Looking at a crash dump taken when the above command was running results in output similar to the following example:
Note that CPU 1's youngest records include a series of write(2) system calls by an mdb -kw process. This result is likely related to the reason for the system failure because a user can modify running kernel data or text with mdb(1) when run with the -k and -w options. In this case, the DTrace data provides at least an interesting avenue of investigation, if not the root cause of the failure. |
||||||