Solstice SyMON 1.6 User's Guide
  Cerca solo questo libro
Scarica il manuale in formato PDF
CHAPTER 1

Solstice SyMON Overview


This chapter gives an overview of the main features of the Solstice SyMON system monitor, which offers simple yet powerful monitoring capabilities for the following servers:
  • (TM)....(TM) · Sun Enterprise 3x00, 4x00, 5x00, and 6x00 servers
  • Sun Enterprise 150 server
  • Sun Enterprise 250 server
  • Sun Enterprise 450 server
  • (TM) · SPARCserver 1000/1000E servers
  • (TM) · SPARCcenter 2000/2000E servers
In addition, Solstice SyMON also has monitoring capabilities for the Sun Enterprise Network Array storage device.
Solstice SyMON uses a CDE/Motif-based graphical user interface (GUI).

Product Features

Solstice SyMON identifies a range of hardware and system status states quickly. For example, it can monitor a major condition such as a CPU failure or a minor condition such as low swap space. You can also monitor hardware performance to detect incipient hardware failures, such as soft read errors on a disk.
To give you this critical performance information, Solstice SyMON analyzes system performance in real time; when performance problems occur, the event system alerts you, if desired, to the status of most system components.
TABLE 1-1 lists the main features.
TABLE 1-1
FeatureDescription
Performance monitoringDiagnoses and addresses potential problems such as capacity problems or bottlenecks. Solstice SyMON monitors four categories of performance data: CPU, disk, memory, and network.
Configuration monitoringDisplays physical and logical views of exact server configurations; configuration monitoring improves system serviceability.
Remote monitoringAllows a server within a network running Solstice SyMON to be monitored from any location in the network.
Fault managementIsolates potential problems or failed components. Provides access to system log file and maintains a log file of conditions for future analysis.
Graphical user interface (GUI)Ensures that users get the information they need quickly and easily with the configurable GUI. The GUI provides access to SunVTS(TM) diagnostics, which diagnoses hardware.

Software Components

Solstice SyMON consists of three interdependent subsystems:
  • Server subsystem--The server subsystem consists of a set of server agents on the monitored server, the Kernel Reader, Config Reader, and Log Scanner (see FIGURE 1-2 on page 4). The agents continuously monitor the hardware status of the server.
  • Event Manager--The Event Manager consists of the Event Generator and the Event Viewer. The Event Generator compares the conditions on the server to a set of pre-defined rules. If the conditions defined in the rule exist on the server, then the actions specified in the rule will be taken.

    You typically configure the Event Generator on a machine other than the one being monitored so that it will continue to run even if the monitored server is down. Each instance of the Event Generator monitors one server.

    For more information on the Event Manager, see "Event Manager" later in this chapter.

  • GUI--You use the GUI, consisting of seven consoles, to view information generated by the Event Generator and the Server subsystem, and to control the execution of SunVTS. You may install the GUI on several machines so others can remotely monitor from multiple locations. The software on the Event Generator communicates to the GUI machine by the GUI polling the server and the Event Generator. In general, the Solstice SyMON architecture is a polling/pulling once, not an asynchronous push type.
FIGURE 1-1 shows the relationship and flow of information between these subsystems.

Grafica

FIGURE 1-1

Server Subsystem

The server subsystem consists of four agents (see FIGURE 1-2):
  • Kernel Reader
  • Config Reader
  • Log Scanner
  • Symond
These agents are briefly described in the following sections.

Grafica

FIGURE 1-2

Kernel Reader

The Kernel Reader extracts summary and process operating system data from the Solaris operating system kernel on a periodic basis. It determines which processes are CPU- and resource-intensive. Also, it monitors and reports on items such as:
  • CPU usage
  • Disk I/O
  • Network I/O
  • Memory usage
  • Swap space usage
  • Paging and swapping rates

Config Reader

The Config Reader monitors the server hardware. It reports system configuration data such as the number of CPU boards and the number and type of I/O boards. It monitors and tracks changes to the system configuration and the state of its components.
It reports on items such as:
  • Disk hot-plug
  • Board hot-swap
  • Board temperature

Log Scanner

The Log Scanner monitors designated system log files searching for a user-specified list of regular expressions as specified in the rules. When the Log Scanner finds a set of messages that matches that list, it notifies the Event Generator and triggers the corresponding event rule.
The Log Scanner can also run in a demand-driven mode from the GUI. It looks for specific entries as indicated by user-specified search parameters.
It searches the /var/adm/messages system log. The Log Scanner can capture panic messages immediately preceeding a system crash.

symond Daemon

The symond daemon starts when the server or the event host is started. symond starts the agents, monitors their activity, and restarts them if they stop. It also provides a central contact point for any clients that want to connect to the agents.

Event Manager

The Event Manager subsystem consists of the Event Generator and the Event Viewer. The Event Generator evaluates the conditions on the server against a set of rules that defines an event. You can modify these rules. When an event is logged, the software takes an action, such as highlighting a component in the display or executing a script.
A predefined set of rules, written in the Tcl scripting language, is provided with Solstice SyMON. FIGURE 1-3 illustrates the Event Manager monitoring system by showing how you can modify events or define new events that tell the software what to do when a particular event occurs.

Grafica

FIGURE 1-3

The Event Generator polls for information about the server. The Server subsystem communicates to the Event Generator (the monitoring machine) by the Event Generator polling the Server subsystem. It evaluates the conditions of the server against a set of pre-defined rules. If the conditions of the server meet the defined set of conditions in the rule, an event is created and the defined actions of the rule are triggered. At this point, the event is known as an open event.
Point events are events that are generated due to log messages. Because point events are closed right away, they are displayed in the All Events display of the Event Viewer but not by the Open Events display of the Event Viewer.
Once an event is open, the defined actions of the rule may include sending email or running a script that alerts you of this event. Routines are also available so events can generate a SNMP trap. The SNMP traps are directed to the Solstice Site Manager(TM) and Solstice Domain Manager(TM), Solstice Enterprise Manager(TM), or other SNMP-listening management platforms.
When the condition that generated the event no longer exists, the Event Manager closes the event and the event is referred to as a closed event. The event log file records the opening and closing of events.

Graphical User Interface

When you start Solstice SyMON, the main GUI window, called the Launcher, is displayed (see FIGURE 1-4).

Grafica

FIGURE 1-4

Launcher and Consoles

The Launcher provides an easy-to-use, windowed display showing vital signs for Solstice SyMON and top level status information of the monitored server. The Launcher enables you to launch consoles (subwindows) to view data about the conditions of the server in a variety of ways. For example, you can view data on events, log files, physical or logical views, kernel information, or process information. The System box blinks green when Solstice SyMON is operating normally.
The Launcher does the following:
  • Tests for available agents (Kernel Reader, Config Reader, Event Generator, and Log Scanner)
  • Displays the console icons and names; each icon displays an "unavailable" symbol (Ø) until it receives data from the appropriate agent. For more information on the unavailable symbol, see "Launcher Window" in Chapter 3, "Using the Solstice SyMON Consoles."
  • Shows the status of agents; red indicates that the GUI is unable to contact the agent, yellow indicates that the GUI has found the agent but is not yet receiving data
  • Initiates online diagnostics through SunVTS, a powerful diagnostic tool that incorporates multifunctional tests of the system through operating system-level calls
There are seven consoles on the Launcher. TABLE 1-2 describes the information presented by each console.
TABLE 1-2
ConsoleFeature
Event ViewerDisplays events that have occurred on the server
Log ViewerDisplays log files; for example, syslog
Physical ViewDisplays a pictorial representation of the server; any component with an open event is highlighted
Logical ViewDisplays a hierarchical schematic view of the server; any component with an open event is highlighted
Kernel Data CatalogDisplays a hierarchy of metrics showing CPU, disk, memory, and network performance; any metric that causes an event is highlighted
Process ViewerDisplays resource-intensive processes running on the server
Online DiagnosticsInitiates the SunVTS diagnostics GUI
TABLE 1-3 describes menu items that are typical in each of the seven consoles listed in TABLE 1-2.
TABLE 1-3
MenuDescription
FileGives you the option to close the window, exit the GUI, save a file or settings, or restore a file or settings
EditOffers different ways to manipulate windows, objects, or text
ViewLists options for viewing data or for displaying additional
information about the data
HelpProvides access to online help
For detailed information on the Launcher and its consoles, see Chapter 3, "Using the Solstice SyMON Consoles."

Information Messages

When an operation that is monitored is incorrect, the software generates information messages in the footer of the active window.

Error and Warning Messages

Solstice SyMON displays warning messages in dialog boxes that appear above the console you are viewing. When a warning message is displayed, you should acknowledge it by selecting an action button such as OK. or Quit.
Some warning messages state that a prerequisite for an operation has not been met. To acknowledge this message, click the Continue button before attempting the operation again.

Monitoring Hardware, System Performance, Alarms, and Events

This section provides an overview of the consoles used to monitor:
  • Hardware status
  • System performance
  • Alarms and events
Chapter 3, "Using the Solstice SyMON Consoles" explains these consoles in more detail.

Monitoring Hardware Status

Use Solstice SyMON to view the configuration of your system, monitor the status of system hardware components, and launch diagnostic tests. The three consoles used to monitor hardware status are:
  • Physical View
  • Logical View
  • Online Diagnostics

Physical View

The Physical View, a pictorial representation of the server, provides a picture of your system as it is actually configured. Information about each component and visual identification of a failed or troubled component is displayed.

Logical View

The Logical View, a companion view for the Physical View, displays the components in a schematic hierarchy.

Online Diagnostics

To test and validate the CPUs, disks, and network connections of a server, the software launches SunVTS, Sun Validation and Test Suite. For information about running SunVTS, refer to the SunVTS 2.1 User's Guide and the SunVTS 2.1 Test Reference Manual on the Solaris 2.6 on Sun Hardware AnswerBook on the Supplement for Solaris 2.6 for Sun Microsystems Computer Company CD.

Monitoring System Performance

Solstice SyMON monitors system performance to isolate and identify system bottlenecks such as:
  • Overloaded CPU
  • Excessive traffic to a single disk
The three consoles that monitor system status are:
  • Kernel Data Catalog
  • Log Viewer
  • Process Viewer
Additionally, System Meters, which are not consoles, also monitor system status.

Kernel Data Catalog

The Kernel Data Catalog provides a hierarchical diagram of the system performance parameters by category (CPU, memory, disk, and network).

System Meters

You can display the performance parameters that the software captures as graphs in the System Meter. . The graphs in the System Meter show you the type and quantity of activities of the CPUs, disks, memory, and network interfaces of the server (see TABLE 1-4). System Meters do not appear on the Launcher, since they are built using the Kernel Data Catalog and the Logical View.
TABLE 1-4
ComponentPerformance Parameters
CPUPerformance across all CPUs on a server or individual CPU performance; parameters that the software monitors are user, system, total busy, wait and idle times, context switches, interrupts, and so on
MemoryMemory available, page scan rate, and total swap space
DiskNumber of active disks, projected fastest service time, number of bytes written, and average disk queue length
NetworkPacket load and the number of packet errors that are occurring on the network interfaces of the server (overflow, underflow, CRC, frame, output, and input errors, as well as collisions)

Log Viewer

The Log Viewer displays entries from the log files, for example, /var/adm/messages. It searches the log using a list of regular expressions. When a match is found, the corresponding log file entry is sent to the Log Viewer. The Log Viewer can also search the SunVTS error log /var/opt/SUNWvts/logs/sunvts.err, if desired.

Process Viewer

The Process Viewer gathers and displays information about resource-intensive processes. You can configure the columns of data.

Monitoring Alarms and Events

Events are system conditions that require the attention of a system administrator. Alarms are used to alert the system administrator of an event, which appears as a message in the Event Viewer and a visual highlight on a console.

Event Generator

The Event Generator monitors the status and performance of the server. The software on the Server Subsystem communicates to the Event Generator (monitoring machine) by the Event Generator polling the server. The Event Generator detects any condition that violates the predefined performance parameters that are specified in the set of event rules.
If the Event Generator detects problems with the system, the affected component can be highlighted in red, yellow (depending on the event severity), or blue (for capacity-related events). All related components are also highlighted. For example, if a CPU chip is highlighted, then the board on which it is mounted and its chassis slot are also highlighted.
For more information on the Event Generator and the Event Manager, see "Event Manager" earlier in this chapter. Also see Chapter 4, "Understanding and Writing Event Rules."

Event Viewer

The console used to monitor events is the Event Viewer. The Event Viewer alerts you to any problems detected by the Event Manager. Each entry displayed in the Event Viewer window represents an event and gives you the following information:
  • The rule number that generated the event
  • The severity level (red, yellow, or blue)
  • The time and date the event was detected
  • A message indicating the type of event that was captured
  • The node where the event occurred
  • The event priority and severity

Interaction with SunReMon

SunReMon(TM) and the Remote Systems Monitoring service are part of the SunSpectrumSM program offering from SunServiceSM. The Remote Systems Monitoring service uses the SunReMon software to collect key system data from critical servers. This data is automatically transferred to the local Sun Solution Center on a regular basis. When you report a problem, the support engineer has this data available to assist in problem analysis.
SunReMon also has the ability to generate a "call-home" email alert to the local Sun Solution Center when a critical problem is detected. This requires an Internet email connection at your site.
Solstice SyMON also has the ability to generate a "call-home" alert. Due to support requirements, the default Solstice SyMON rules do not use the "call-home" feature of the Remote Systems Monitoring service. For information on configuring the Remote Systems Monitoring service, contact your local Sun Solution Center.
The Remote Systems Monitoring service is available to SunSpectrumSM GoldSM and SunSpectrumSM PlatinumSM contract customers throughout the world. For local availability, contact your SunService sales representative.