Chapter 28 Managing System Crash Information (Tasks)
This chapter describes how to manage system crash information
in the Solaris environment.
For information on the procedures associated with managing system crash
information, see Managing System Crash Information (Task Map).
Managing System Crash Information (Task Map)
The following task map identifies the procedures needed to manage system
crash information.
|
Task
|
Description
|
For Instructions
|
|
1. Display the current crash dump
configuration
|
Display
the current crash dump configuration by using the dumpadm
command.
|
How to Display the Current Crash Dump Configuration
|
|
2. Modify the crash dump configuration
|
Use the dumpadm command to specify the type of data to dump, whether or not the
system will use a dedicated dump device, the directory for saving crash dump
files, and the amount of space that must remain available after crash dump
files are written.
|
How to Modify a Crash Dump Configuration
|
|
3. Examine a crash dump file
|
Use the mdb command
to view crash dump files.
|
How to Examine a Crash Dump
|
|
4. (Optional) Recover from a full
crash dump directory
|
The system crashes but no room is available in the savecore
directory, and you want to save some critical system crash dump information.
|
How to Recover From a Full Crash Dump Directory (Optional)
|
|
5. (Optional) Disable or enable
the saving of crash dump files
|
Use the dumpadm command to disable or enable the saving
the crash dump files. Saving crash dump files is enabled by default.
|
How to Disable or Enable Saving Crash Dumps
|
System Crashes (Overview)
System crashes can occur due to hardware malfunctions, I/O problems,
and software errors. If the system crashes, it will display an error message
on the console, and then write a copy of its physical memory to the dump device.
The system will then reboot automatically. When the system reboots, the savecore command is executed to retrieve the data from the dump
device and write the saved crash dump to your savecore
directory. The saved crash dump files provide invaluable information to your
support provider to aid in diagnosing the problem.
System Crash Dump Files
The savecore command runs automatically after a system
crash to retrieve the crash dump information from the dump device and writes
a pair of files called unix.X
and vmcore.X, where X identifies the dump sequence number. Together, these files
represent the saved system crash dump information.
Crash dump files are sometimes confused with core
files, which are images of user applications that are written when the application
terminates abnormally.
Crash dump files are saved in a predetermined directory, which by default,
is /var/crash/hostname. In
previous Solaris releases, crash dump files were overwritten when a system
rebooted, unless you manually enabled the system to save the images of physical
memory in a crash dump file. Now, the saving of crash dump files is enabled
by default.
System crash information is managed with the dumpadm
command. For more information, see The dumpadm Command.
Saving Crash Dumps
You can examine the control structures, active tables, memory images
of a live or crashed system kernel, and other information about the operation
of the kernel by using the mdb utility. Using mdb to its full potential requires a detailed knowledge of the kernel,
and is beyond the scope of this manual. For information on using this utility,
see mdb(1).
Additionally, crash dumps saved by savecore can be
useful to send to a customer service representative for analysis of why the
system is crashing.
The dumpadm Command
Use the dumpadm command to manage system crash dump
information in the Solaris environment.
-
The dumpadm command enables you to configure
crash dumps of the operating system. The dumpadm configuration
parameters include the dump content, dump device, and the directory in which
crash dump files are saved.
-
Dump data is stored in compressed format on the dump device.
Kernel crash dump images can be as big as 4 Gbytes or more. Compressing the
data means faster dumping and less disk space needed for the dump device.
-
Saving crash dump files is run in the background when a dedicated
dump device, not the swap area, is part of the dump configuration. This means
a booting system does not wait for the savecore command
to complete before going to the next step. On large memory systems, the system
can be available before savecore completes.
-
System crash dump files, generated by the savecore command, are saved by default.
-
The savecore -L command is a new feature
which enables you to get a crash dump of the live running Solaris Operating
System. This command is intended for troubleshooting a running system by taking
a snapshot of memory during some bad state, such as a transient performance
problem or service outage. If the system is up and you can still run some
commands, you can execute the savecore -L command to save
a snapshot of the system to the dump device, and then immediately write out
the crash dump files to your savecore directory. Because
the system is still running, you can only use the savecore -L
command if you have configured a dedicated dump device.
The following table describes dumpadm's configuration
parameters.
|
Dump Parameter
|
Description
|
|
dump device
|
The device that stores dump data temporarily
as the system crashes. When the dump device is not the swap area, savecore runs in the background, which speeds up the boot process.
|
|
savecore directory
|
The directory that stores system crash
dump files.
|
|
dump content
|
Type of memory data to dump.
|
|
minimum free space
|
Minimum amount of free space required
in the savecore directory after saving crash dump files.
If no minimum free space has been configured, the default is one Mbyte.
|
For more information, see dumpadm(1M).
The dump configuration parameters managed by the dumpadm
command are stored in the /etc/dumpadm.conf file.
Note –
Do not edit the /etc/dumpadm.conf file manually.
Editing this file manually could result in an inconsistent system dump configuration.
How the dumpadm Command Works
During system startup, the dumpadm command is invoked
by the /etc/init.d/savecore script to configure crash
dumps parameters based on information in the /etc/dumpadm.conf
file.
Specifically, dumpadm initializes the dump device
and the dump content through the /dev/dump interface.
After the dump configuration is complete, the savecore
script looks for the location of the crash dump file directory by parsing
the content of /etc/dumpadm.conf file. Then, savecore is invoked to check for crash dumps and check the content
of the minfree file in the crash dump directory.
Dump Devices and Volume Managers
Do not configure a dedicated dump device that is under the control of
volume management product such as Solaris Volume Manager for accessibility
and performance reasons. You can keep your swap areas under the control of
Solaris Volume Manager and this is a recommend practice, but keep your dump
device separate.
Managing System Crash Dump Information
Keep the following key points in mind when you are working with system
crash information:
-
You must be superuser to access and manage system crash information.
-
Do not disable the option of saving system crash dumps. System
crash dump files provide an invaluable way to determine what is causing the
system to crash.
-
Do not remove important system crash information until it
has been sent to your customer service representative.
How to Display the Current Crash Dump Configuration
-
Become superuser.
-
Display the current crash dump configuration.
# dumpadm Dump content: kernel pages
Dump device: /dev/dsk/c0t3d0s1 (swap)
Savecore directory: /var/crash/venus
Savecore enabled: yes
|
The preceding example output means:
-
The dump content is kernel memory pages.
-
Kernel memory will be dumped on a swap device, /dev/dsk/c0t3d0s1. You can identify all your swap areas with the swap -l command.
-
System crash dump files will be written in the /var/crash/venus directory.
-
Saving crash dump files is enabled.
How to Modify a Crash Dump Configuration
-
Become superuser.
-
Identify the current crash dump configuration.
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0t3d0s1 (swap)
Savecore directory: /var/crash/pluto
Savecore enabled: yes
|
This output identifies the default dump configuration for a system running
the Solaris 9 release.
-
Modify the crash dump configuration.
# dumpadm -c content -d dump-device -m nnnk | nnnm | nnn% -n -s savecore-dir
|
|
-c content
|
Specifies the type of data to dump. Use kernel to dump
of all kernel memory, all to dump all of memory, or curproc, to dump kernel memory and the memory pages of the process
whose thread was executing when the crash occurred. The default dump content
is kernel memory.
|
|
-d dump-device
|
Specifies the device that stores dump data temporarily as the system crashes.
The primary swap device is the default dump device.
|
|
-m nnnk | nnnm | nnn%
|
Specifies the minimum free disk space
for saving crash dump files by creating a minfree file
in the current savecore directory. This parameter can
be specified in Kbytes (nnnk), Mbytes (nnnm)
or file system size percentage (nnn%). The savecore command consults this file prior to writing the crash dump files.
If writing the crash dump files, based on their size, would decrease the amount
of free space below the minfree threshold, the dump files
are not written and an error message is logged. For information on recovering
from this scenario, see How to Recover From a Full Crash Dump Directory (Optional).
|
|
-n
|
Specifies that savecore
should not be run when the system reboots. This dump configuration is not
recommended. If system crash information is written to the swap device, and savecore is not enabled, the crash dump information is overwritten
when the system begins to swap.
|
|
-s
|
Specifies an alternate directory for
storing crash dump files. The default directory is /var/crash/hostname where hostname is the output of the uname -n command.
|
Example—Modifying a Crash Dump Configuration
In this example, all of memory is dumped to the dedicated dump device, /dev/dsk/c0t1d0s1, and the minimum free space that must be available
after the crash dump files are saved is 10% of the file system space.
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0t3d0s1 (swap)
Savecore directory: /var/crash/pluto
Savecore enabled: yes
# dumpadm -c all -d /dev/dsk/c0t1d0s1 -m 10%
Dump content: all pages
Dump device: /dev/dsk/c0t1d0s1 (dedicated)
Savecore directory: /var/crash/pluto (minfree = 77071KB)
Savecore enabled: yes
|
How to Examine a Crash Dump
-
Become superuser.
-
Examine a crash dump by using the mdb utility.
# /usr/bin/mdb [-k] crashdump-file
|
|
-k
|
Specifies kernel debugging mode by assuming the file
is an operating system crash dump file.
|
|
crashdump-file
|
Specifies the operating system crash
dump file.
|
-
Display crash status information.
# /usr/bin/mdb file-name
> ::status
.
.
.
> ::system
.
.
.
|
Example—Examining a Crash Dump
The following example shows sample output from the mdb
utility, which includes system information and identifies the tunables that
are set in this system's /etc/system file.
# /usr/bin/mdb -k unix.0
Loading modules: [ unix krtld genunix ip nfs ipc ptm ]
> ::status
debugging crash dump /dev/mem (64-bit) from ozlo
operating system: 5.9 Generic (sun4u)
> ::system
set ufs_ninode=0x9c40 [0t40000]
set ncsize=0x4e20 [0t20000]
set pt_cnt=0x400 [0t1024]
|
How to Recover From a Full Crash Dump Directory (Optional)
In this scenario, the system crashes but no room is left in the savecore directory, and you want to save some critical system crash
dump information.
-
Log in as superuser after the system reboots.
-
Clear out the savecore directory, usually /var/crash/hostname, by removing existing crash dump files that have already
been sent to your service provider. Or, run the savecore
command and specify an alternate directory that has sufficient disk space.
See the next step.
-
Manually run the savecore command and if necessary,
specify an alternate savecore directory.
How to Disable or Enable Saving Crash Dumps
-
Become superuser.
-
Disable or enable the saving of crash dumps on your system.
Example—Disabling the Saving of Crash Dumps
This example illustrates how to disable the saving of crash dumps on
your system.
# dumpadm -n
Dump content: all pages
Dump device: /dev/dsk/c0t1d0s1 (dedicated)
Savecore directory: /var/crash/pluto (minfree = 77071KB)
Savecore enabled: no
|
Example—Enabling the Saving of Crash Dumps
This example illustrates how to enable the saving of crash dump on your
system.
# dumpadm -y
Dump content: all pages
Dump device: /dev/dsk/c0t1d0s1 (dedicated)
Savecore directory: /var/crash/pluto (minfree = 77071KB)
Savecore enabled: yes
|