Contained Within
Find More Documentation
Featured Support Resources
| Scarica il manuale in formato PDF
Understanding and Using Event Rules
4
- This chapter provides an overview of how to understand and write Solstice SyMON event rules. This chapter describes:
-
- Event rules terminology
- Tcl rules
- Hierarchies
- Simple rules
- Complex rules
- How to write event rules
- How to verify new event rules
- Reserved words
- Special characters
- Debugging tips
Event Rules Terminology
-
Events are hardware or operating system conditions that may require the attention of a system administrator. Examples of events include: the loss of a CPU or disk, processor overload, or excessive swapping.
- The Event Handler subsystem of Solstice SyMON alerts you to events. However, you must first write a rule that defines the event. Rules include a condition and other attributes that define the state of the rule and actions to take.
- A condition is an expression that defines when a rule is active. An example of a condition might be a failed board.
- An action tells Solstice SyMON what to do when a condition is true, if the condition changes, or if the system shuts down. Actions notify users of a situation that may require attention.
- In addition to conditions and actions, a rule may also include other attributes, such as the level, priority, and severity of a rule.
- The Event Handler collects data from monitoring agents and evaluates the data against its rules. When a condition is true, the Event Handler logs an event and carries out the appropriate action in the rule. When the condition that generated the event no longer exists, the Event Handler may run a special action such as closing the event. The Event Handler is always running. If it stops and restarts, all previously open events are closed.
Tcl Rules
- The event rules are written in the Tcl scripting language. For a definition of Tcl syntax and complete instructions on how to write Tcl scripts, refer to:
-
-
Tcl and the Tk Toolkit, by John K. Ousterhout, Addison-Wesley Publishing Co.: 1994.
-
Practical Programming in Tcl and Tk, by Brent B. Welch, Prentice Hall: 1995.
- Tcl Reference card--available from Specialized Systems Consultants, Inc., P.O. Box 55549, Seattle WA., 98155-0549.
-
Caution - Do not modify event rules if you do not know Tcl.
- For additional information, contact software support at Sun Microsystems, Inc. at 1-800-USA-4SUN or 1-800-872-4786.
- Solstice SyMON includes a set of pre-defined rules in the Tcl variable RULES located in the /etc/opt/SUNWsymon directory.The following rule files are included:
-
-
rules.tcl -- organizer
-
cprules.tcl-- capacity planning rules
-
hwrules.tcl-- hardware monitoring rules
-
-
syrules.tcl-- Solstice SyMON monitoring rules
-
egrules.tcl-- Event Handler rules
-
pfrules.tcl-- predictive failure rules
-
swrules.tcl-- system monitoring rules
- Writing rules is simplified by using a set of Tcl commands that tell Solstice SyMON what to do in a given situation. For example, the Tcl alarm command tells Solstice SyMON what to do to make an event active. The Event Handler reads the rules file.
- You may expand Tcl to include your own procedures by editing the event_gen.tcl file.
Special Characters
- Tcl includes these special characters:
-
- "(quotes)
- { }(curly bracket)
- [ ](square bracket)
- # (pound sign)
- * (asterisk)
- Writing rules is simplified by using a set of Tcl commands and procedures. For more information on these characters, refer to the Tcl reference material in the "Tcl Rules" section.
Reserved Words
-
Table 4-1describes the Tcl reserved variable names, which contain certain values that have meaning for the Event Handler. You cannot redefine these variable names.
-
Table 4-1
| Variable Name | Meaning |
| symond_status | Tells whether the Solstice SyMON daemon is running |
| server_status | Tells whether the monitored system is up |
| LogScanner_status | Tells if the Log Scanner agent is running |
| ConfigReader_status | Tells if the Config Reader agent is running |
| KernelReader_status | Tells if the Kernel Reader agent is running |
| node | Contains the hierarchy that is being evaluated in the rule |
| myroot | Root node for the Event Handler hierarchy; it should not be changed by the user |
Event Rule Attributes
- Each attribute, with the exception of the condition attribute has a label followed by a value. The condition attribute does not have a label and is a mandatory attribute for each rule.
-
Table 4-2 contains a list of attribute descriptions.
-
Table 4-2
| Name | Value | Description |
| RULE | Integer | Rule number |
| ON_OPEN | Tcl script | Tcl string to be interpreted when the condition of the rule becomes true |
| ON_CLOSE | Tcl script | Tcl string to be interpreted when the condition of a rule is no longer true |
| ON_CONTINUE | Tcl script | Tcl string to be interpreted when the condition of a rule continues to be true |
-
Table 4-2 (Continued)
| Name | Value | Description |
| ON_SHUTDOWN | Tcl script | Tcl string to be interpreted when the Event Handler shuts down |
| PARAMETERS | String | User-defined parameters |
| MULTI | Tcl script | Tcl string to provide a list of data node variables for multiple events of a single rule |
| SEVERITY | Integer | Severity of the event (user-defined and interpreted) |
| PRIORITY | Integer | Priority for the event (user-defined and interpreted) |
| RATE | Integer | For the user-defined rule sampling rate, in seconds |
| COMMENTS or C | String | Comments field |
| LOG_RULES | Log script | Tcl string that defines the log file scanner activity that supports rules |
Rule Functions
- A rule function is a Tcl command used in Tcl scripts associated with the RULES variable. Table 4-3 lists common commands.
-
Table 4-3
| Command | Description |
| alarm | Makes an event active, highlights the hierarchy node (RED, YELLOW, or BLUE) and creates entries in both the event log and the Event Viewer (with predefined message) of the event. The syntax is alarm level node "message" "Tcl command". |
| end_alarm | Closes the log entry in the Event Log; the syntax is end_alarm. |
| findlist | Builds the list of matching nodes. The syntax is findlist hierarchypath striplist. |
-
Table 4-3 (Continued)
| Command | Description |
| findvalue | Takes the name of any data hierarchy variable and returns the
value of the variable; the syntax is findvalue node (where node
is the hierarchy endpoint). |
| getfield | Returns the internal data value from a rule or a rule-node combination (list from a MULTI string). The syntax is getfield optional_rule_# field_type. For example, { [getfield COUNT] == 1}. The following field specifiers can be used with getfield:
- ACTIVE: True or false if the rule is currently active - COUNT: Returns the number of consecutive iterations of the rule being active
- PRIORITY: Returns the priority value of the rule - SEVERITY: Returns the severity value of the rule - RULE: Returns the rule number - START_TIME: Returns the time the rule became active
|
| putfield | Takes a field name and data and assigns that data to the field of the current rule. The syntax is
putfield optionalrule# fieldtype "value".
|
| get_parameter | Manages a parameter list; used for MULTI rules that need to maintain historical records. The syntax is get_parameter string. |
| put_parameter | Manages a parameter list; used for MULTI rules that need to maintain historical records. The syntax is put_parameter tag value. |
| gettime | Gets the sample time on the monitored machine in a long integer. The syntax is gettime. |
| dynlink | Takes a shared object and procedure name and dynamically links the shared object and calls the name procedure. The syntax is dynlink file functionname. |
-
Table 4-3 (Continued)
| Command | Description |
| mailto | Sends a message to a specified name by email. The syntax is mailto address "msgstring." |
| syslog | Writes the specified string to the syslog. The syntax is syslog "message." |
| snmp | Initiates SNM traps . snmp takes a string as an argument and generates snmp traps on every machine in the snmp_hosts variable. snmp_hosts is defined in event_gen.tcl. The syntax is snmp "message." |
Hierarchies
- Hierarchies organize data in Solstice SyMON to handle the grouping of related pieces of information. A hierarchy has a:
-
- Top level node representing all data from an agent
- Subset of the top level node for classes of data
- Subnode for closely related data and properties containing the actual data
- Each node and subnode organize the data beneath it. For example, in:
-
KernelReader.cpu.cpu1.busy
|
-
- The top level KernelReader indicates that this is KernelReader data.
-
KernelReader.cpu indicates that this is CPU data.
-
KernelReader.cpu.cpu1 indicates that this data is related to CPU1.
-
KernelReader.cpu.cpu1.busy is the percentage of time that the CPU1 was busy.
- For more information on what data is available, refer to the Solstice SyMON man pages. For detailed information on the Kernel Reader, see Appendix A, "Kernel Reader."
- The following sections present examples of simple and complex rules.
Simple Rules
- A simple rule checks one or more variables in a simple condition and generates one event if the condition is true. Code Example 4-1 is an example of a simple rule.
-
{
RULE 2
{ expr { "$server_status" == "dead" } }
ON_OPEN { alarm RED "" "Server not responding" "" }
ON_CLOSE { end_alarm }
SEVERITY 1
PRIORITY 1
}
|
-
Code Example 4-1 Simple Rule Example
- The first attribute in this rule of Code Example 4-1is the rule number.
- The second attribute is the condition. This condition checks to see if the server_status variable is equal to "dead." If server_status is equal to "dead," the rule is active. expr is a Tcl function that evaluates an expression and returns a value. Tcl variables such as $server_status in Code Example 4-1 are part of the Event Handler variables.
- The third and fourth attributes are a set of actions that are carried out as the conditions of the rule change. When the rule becomes active, the rule triggers an alarm with the RED condition. The predefined Tcl alarm command activates an event and writes an entry into the Event Log. When the condition is no longer true (ON_CLOSE), the rule triggers end_alarm, which closes the log entry in the Event Log, removes any highlighting in the GUI, and deletes the event from the open event list.
- The SEVERITY and PRIORITY of the rule are the last two attributes in the rule. These are numeric values defined by the user.
- The following rule in Code Example 4-2 is slightly more complicated. It defines a swap space event:
-
{
RULE 18
{
set ts [ expr 0.10 * [ findvalue \
KernelReader.mem.swap_total ] ]
expr { [findvalue KernelReader.mem.swap_free ] < $ts }
}
ON_OPEN { alarm RED KernelReader.mem.swap_free "Serious Swap
Problem" "" }
ON_CLOSE { end_alarm }
SEVERITY 2
PRIORITY 1
}
|
-
Code Example 4-2 Rule 18: Monitoring Swap Space
- The first attribute in this rule is the rule number.
- The next attribute is the condition. The condition does the following:
-
- Sets variable ts (total swap space) to 10 percent of the total swap space available on the machine. KernelReader.mem.swap_total is a performance property in the data hierarchy of Solstice SyMON; findvalue is a predefined Tcl command that returns the value of the performance variable
- Finds out how much swap space is free and unused
- Compares the values for unused swap space and total swap space; if the unused swap space is less than 10 percent of total swap space, there is a potential problem
- The ON_OPEN attribute tells the Event Handler to generate a RED alarm by calling the alarm function with the RED argument. This highlights a node on the Kernel Data Catalog and adds the event to the Event Log. All the arguments for the ON_OPEN alarm function attribute are mandatory. If an argument is not used, it is replaced by a set of double quotes ("").
- The ON_CLOSE attribute tells Solstice SyMON what to do when the condition becomes false; end_alarm closes the log entry in the Event Log and the event in the Event Viewer.
-
Note - Always explicitly close an alarm with the end_alarm function. An alarm does not automatically close when the condition goes away.
- The SEVERITY and PRIORITY of the rules are the last two attributes. These are both user-defined and interpreted.
Complex Rules
- A complex rule checks the condition against several hierarchy nodes. If any condition is true, it generates an event for that variable.
- Complex rules eliminate the need to write many simple rules that check the same condition. For example, if you use simple rules to check the condition of each CPU on a server, you will write many simple redundant rules.
-
Code Example 4-3 is an example of a complex rule:
-
{
RULE 1
MULTI { expr { [ findlist system.*.*.*.status "" ] } }
{
set boardstatus [ findconfigvalue $node ]
expr { "$boardstatus" == "failure detected" }
}
ON_OPEN { alarm RED $node "Board failure detected" "" }
ON_CLOSE { end_alarm }
SEVERITY 1
PRIORITY 1
}
|
-
Code Example 4-3 Complex Rule Example
- Rule 1 examines the status of all boards in a system in Code Example 4-3. The condition of the rule is that the board has failed. If the board fails, the Event Handler opens a RED event and sends a message to the Event Log (ON_OPEN). When the condition is no longer true (ON_CLOSE), the Event Handler executes the end_alarm procedure.
- The MULTI attribute is a Tcl script that evaluates an expression and creates a list of nodes. The condition of the rule is run once for each node in the list, and the Tcl node variable is assigned the value of the node being processed or evaluated. Any other Tcl script associated with the rule can access this value with $node. The normal approach is to use the findlist function to generate a list of nodes and check their values in a rule. A findlist must be present in a MULTI attribute.
- The following statement, taken from Code Example 4-3, generates a list of all data items that start with system and end with status.
-
MULTI { expr { [ findlist system.*.*.*.status "" ] } }
|
Writing Event Rules
- You can create rules against data from any of the three Solstice SyMON agents: Config Reader, Kernel Reader, and Log File Scanner. The first two agents provide data continuously to the Event Handler. You only need the path name to the variable.
- For the Log File Scanner, you must pre-define the messages that are sent to the Event Handler for examination.
- Here are a few guidelines to keep in mind when writing event rules:
-
- A rule is a Tcl variable.
- The Event Handler rule attributes and labels are strings to Tcl.
- A Tcl variable is a string or list of strings.
- Rules are free form. They only require attributes (label and data) pairs and can appear in any order.
- Attributes are separated by new lines or spaces.
-
- Each attribute consists of one or two components. The first component is the name (label) and the second component is the associated value. The exception to this rule is the condition, which does not have a label.
- An attribute's components are separated by spaces, tabs, or new lines.
- Only the last instance of each label is used per rule. All others are ignored.
- Spaces are allowed in an attribute item data if it is enclosed in quotes.
- Each label may have one unique item.
- Each Tcl command string is enclosed in curly brackets.
- A condition is the only mandatory attribute in a rule.
- A condition of a rule can look at time or other values in the environment. For example, "set current_time [gettime]" sets the user-defined variable, current_time, to the current time on the monitored machine.
- A rule does not have to include an action.
- A rule can set a value that is used in other rules.
- All rules are combined into a single Tcl variable called RULES.
- Solstice SyMON can execute phone_home scripts.
Verifying New Event Rules
- Solstice SyMON includes the special verify_rules command, which checks for Tcl syntax. The command takes three optional arguments, which are described in Table 4-4.
-
Table 4-4 verify_rules
| Argument | Description |
| -R | Checks the file that contains the EVENTS variable; the default is rules.tcl |
| -I | Checks the file that contains supporting functions; the default is event_gen.tcl |
| -o | Gives more verbose output |
-
* To verify rules, enter:
-
- The filename entry is optional. The default filename is rules.tcl.
- When you run verify_rules and the rules are correct, the program responds, "GOOD RULES." If the program detects a syntax error, the program responds, "BAD RULES." This does not guarantee that the rule will work as written.
Activating New or Modified Events
- To activate new or modified rules, send the signal SIGHUP to the Event Handler or restart the Event Handler. Send SIGHUP by invoking the following command:
-
- where pid is the process ID number of the Event Generator, sm_egd. For more information, see the kill man page.
Debugging Tips
- Use the following debugging tips for event rules:
-
- Run verify_rules to make sure the syntax is accurate.
- If there is a problem with the rules, change the rules.tcl file so you test only one rule at a time.
- Use this event log file to search for error messages: /var/opt/SUNWsymon/machine_name/event_log.
- Use the syslog command to log variable values and to confirm the values.
|
|