Solstice SyMON 1.4 User's Guide
只搜寻这本书
以 PDF 格式下载本书
CHAPTER 4

Understanding and Writing Event Rules


This chapter explains how to write or modify event rules that are written in the Tcl
scripting language. It describes the categories of event rules, the syntax of the Tcl
functions, and how to write or modify event rules for agents including: Config
Reader, Kernel Reader, Log Scanner, and Event Generator. At the end of this
chapter, examples of simple and complex rules are presented.

Overview of Event Rules

This section describes and explains:
  • Terminology used with event rules
  • Categories of event rules
  • Rules files
  • Special characters and reserved words
  • Attributes
Event rules are not required to include an action and can set a value that is used in
other rules. All rules are combined into a single Tcl variable called Rules.tcl,
which is case sensitive.
For a list of books describing the Tcl language, see "Related Documents" in the
Preface.

CAUTION Caution - You must know Tcl to modify event rules. Do not modify event rules if
you do not know Tcl. Incorrectly modifying an event rule will cause an error or an
incorrect result to occur.

Terminology

Events are hardware or operating system conditions that may require the attention of a system administrator. Examples of events include:
  • Processor overload
  • Excessive swapping
  • Failed power supply
  • Failed fan
  • Soft disk errors
  • Extreme temperature conditions
Before the Event Generator subsystem can alert you of an event, you must first write a rule that defines the event. A rule includes a condition and other attributes that define the state of the rule and the subsequent actions to take.
A condition is an expression that defines when a rule is active. An example of a condition is a failed board. Actions identify how to tell users about a situation that may require attention. For example, an action tells Solstice SyMON what to do when a condition is true, if the condition changes, or if the system shuts down.
In addition to conditions and actions, a rule may also include the attributes listed in TABLE 4-1.
TABLE 4-1 LEVELPRIORITYSEVERITY
AttributeDescription
LEVELCauses the appearance of an appropriately colored icon in the LEVEL column of the event display. By convention, YELLOW is caution, RED is danger, and BLUE is a capacity warning.
PRIORITYArbitrary integer value. For the rules that accompany Solstice SyMON, 1 is the highest priority and 4 is the lowest priority. You can swap or customize these values. For example, you can make 4 the highest priority and 1 the lowest priority. You can also customize the priority by assigning other numeric values, such as 99, to PRIORITY.
SEVERITYArbitrary integer value. For the rules that accompany Solstice SyMON, 1 is the highest priority and 4 is the lowest priority. You can swap or customize these values. For example, you can make 4 the highest priority and 1 the lowest priority. You can also customize the severity by assigning other numeric values, such as 99, to SEVERITY.

Types of Event Rules

Sun supplies a set of pre-defined rules with Solstice SyMON (see Appendix D, "Default Solstice SyMON Rules." They are divided into four major categories (see TABLE 4-2).
TABLE 4-2
CategoryDescription
Non-hardwareMonitors non-hardware, such as CPU usage and swap space
HardwareMonitors hardware for potential trouble, such as a failed processor or extreme temperature conditions
Capacity PlanningTakes a long term view of events; monitors CPU, memory, disk and network processes to identify potential bottlenecks and provides information about how to upgrade a system
Predictive Failure AnalysisLooks at the soft error rates of SIMMs and disks; issues a warning if a potential component failure is identified

Description of Rules Files

The rules.tcl file defines the location of the rules that tell Solstice SyMON what to do in a given situation. Each type of event rule has a rule file (see TABLE 4-3). To tell the Event Generator where to look for theses rules files, set the path in event_gen.servername.tcl. For a list of the default paths, see "Locating Rule Files" later in this chapter.
The rules.tcl file reads the rest of the rules files listed in TABLE 4-3 into the input stream by doing a psource of each of the rule files.

  psource n swrules.tcl  
  psource n syrules.tcl  
  psource y rultext.tcl  

CODE EXAMPLE 4-1 psource Code Example
where:
n after the psource command in the rules.tcl file means not to issue an error message if it can't find the file.
y after the psource command in the rules.tcl file means to issue an error message if it can't find the file
All of the rules files listed in TABLE 4-3 are located in rultext.tcl. There is a different rultext.tcl file for each country in which the software is localized. These rultext.tcl files are located in subdirectories. For example, the English rultext.tcl file is located in the C subdirectory.
The following rules files (see TABLE 4-3) contain generic rules, which apply to any system type. However, platform specific rules (see TABLE 4-4) apply only to one type or one class of system.
TABLE 4-3
RuleDescription
rules.tclOrganizer (master file)
cprules.tclCapacity planning rules
hwrules.tclHardware monitoring rules
syrules.tclMonitoring rules
egrules.tclEvent Generator rules
pfrules.tclPredictive failure rules
swrules.tclSystem monitoring rules
rultext.tclMessage string definition rules
TABLE 4-4 lists platform-specific rules files and the platforms to which they apply.
TABLE 4-4
RuleDescription
SS1000.tclSPARCserver 1000/1000E
SC2000.tclSPARCcenter 2000/2000E
UEnterprise.tclUltra Enterprise 3000, 4000, 5000,
and 6000 servers
UEnterpriseI.tclUltra Enterprise 2 and 150 servers
UE450.tclUltra Enterprise 450 server
For each hardware platform, there is a corresponding rultext_<hardwarename>.tcl file. For example, for the SPARCserver 1000 and 1000E, there is a rultext_SS1000.tcl file and for the SPARCcenter 2000 there is a rultext_SC2000.tcl file. The rultext_<hardwarename>.tcl files contain messages that can be translated into different languages.

Special Characters

The following special characters are used in the Tcl language:
  • " " (double quotes)
  • { } (curly brace)
  • [ ] (square bracket)
  • # (pound sign)
  • * (asterisk)
Follow these guidelines when using these special characters:
  • Enclose each Tcl function string in curly braces { }.
  • If you include spaces in an attribute data item, enclose the data item in double quotes.

Reserved Words

TABLE 4-5 describes the Tcl reserved variable names, which contain certain values that have specific meaning for the Event Generator.

Note - Do not redefine these variable names.

TABLE 4-5
Variable NameDescription
symond_statusTells whether the symond daemon is running
server_statusTells whether the monitored system is up
LogScanner_statusTells if the Log Scanner agent is running
ConfigReader_statusTells if the Config Reader agent is running
KernelReader_statusTells if the Kernel Reader agent is running
nodeContains the root hierarchy that is being evaluated in the rule; accessible only from within a MULTI rule (see "MULTI Rules" later in this chapter)
TABLE 4-5 (Continued)
Variable NameDescription
my_rootContains the root node for the Event Generator hierarchy; this variable should not be changed
valueContains the value of the node returned by the findlist function in a MULTI rule; accessible only from within a MULTI rule (see "MULTI Rules" later in this chapter)
timestampContains the timestamp of the matched message

Attributes

A rule is a list of attributes. Attributes list conditions under which the rules should activate or close down and what actions to take when the rule is activated. A condition of a rule can look at time or other values in the environment.
Each attribute has a label followed by a value. Separate the components of an attribute with spaces, tabs, or new lines and separate each attribute with a new line or space.
TABLE 4-6 describes attribute descriptions.
TABLE 4-6
LabelValueAttribute Description
RULEIntegerRule number displayed by the Event Viewer to show which rule generated the event. The rule number must be unique across all rule files.
ON_OPENTcl scriptTcl string giving actions to take when the condition of the rule becomes true
ON_CONTINUETcl scriptTcl string giving actions to take when the condition of a rule continues to be true
ON_CLOSETcl scriptTcl string giving actions to take when the condition of a rule is no longer true
ON_ACKNOWLEDGETcl scriptInterpreted when the Event Generator receives an acknowledgment of an event from the GUI
ON_SHUTDOWNTcl scriptTcl string to be interpreted when the Event Generator shuts down
PARAMETERSStringUser-defined parameters
MULTITcl scriptTcl string to provide a list of data node variables for multiple events of a single rule
TABLE 4-6 (Continued)
LabelValueAttribute Description
SEVERITYIntegerSeverity of the event (user-defined and interpreted); 1 = most severe and 4 = least severe; see also TABLE 4-1, earlier in this chapter, which describes this attribute in more detail
PRIORITYIntegerPriority of the event (user-defined and interpreted); 1 = most severe and 4 = least severe; see also TABLE 4-1, earlier in this chapter, which describes this attribute in more detail
RATEIntegerThe frequency to execute the rule (seconds); for example, RATE 60 means the rule will evaluate every 60 seconds; the default is to execute the rule every time the Event Generator receives new data from the server
COMMENTS or CStringComments field
LOG_RULESLog scriptTcl script that defines the log scanner activity that supports rules
ON_FIXTcl scriptSpecifies if an event can be marked as fixed; the Tcl script is interpreted by the Event Generator when an event is marked as fixed

MULTI Rules

This section describes MULTI rules, which are used in Tcl functions. The following paragraph explains this code example:

  MULTI { expr { [ findlist KernelReader.disk.*.busy .busy ] } }  

The MULTI statement tells the Event Generator to repeat the same rule on each node that is returned from the findlist function. The findlist function finds the node whose name matches KernelReader.disk.*.busy. Since the strip string (strip string means remove or strip out) is defined as .busy, the node name that is returned in the node variable by the findlist function contains only the matched node name, which excludes .busy. For example, if the findlist function matches a node named KernelReader.disk.sd8.busy, the $node variable contains KernelReader.disk.sd8. Because the strip string is not null, it does not have to be in double quotes.

Rule Functions

This section describes the Tcl rule functions associated with the Rules.tcl variable.

alarm

The alarm function makes an event active, highlights the hierarchy node (RED, YELLOW, or BLUE) and creates entries in both the Event Log and the Event Viewer with a predefined message of the event. The syntax of the alarm function is:

  alarm level node "message" "Tcl command"  

where:
level can be: RED, YELLOW, or BLUE. By convention, these levels mean
--RED is used for a situation or event that requires immediate attention
--YELLOW is used for warning events
--BLUE is used for capacity planning events
node is the path name for a hierarchy node to be highlighted on the display. Using an empty string ( " ") instead of the node name means you do not want the node to be highlighted. node contains the name of the node that is in question or in error.
message is a string that is displayed in the Event Viewer. Messages can be either text strings or variables. For internationalization, the text strings are set to variables in double quotes, which expands to the expected string. The variables are used to reference entries in rultext.tcl. For example, the following line of a Tcl function uses the "r0mess" variable for internationalization. For the complete rule, see CODE EXAMPLE 4-3.

  ON_OPEN { alarm RED "" "$r0mess" "" }  

Tcl command is a Tcl string that is executed by the GUI when it receives the event. If you do not want to execute a Tcl function, use an empty string (" ").
To close an open alarm condition, see the end_alarm function.

crnode

The crnode function generates a Config Reader node based on elements returned from the Log Scanner. The syntax is:

  crnode keyword  

Examples of the keyword are: DISK,SIMM, CPU,QLGC, PLN, SOC.
TABLE 4-7 defines the keywords.
TABLE 4-7 crnode
KeywordDescription
DISKUses one argument; examples are sd0 or ssd0
SIMMUses two arguments (for example 0 J3201); in this example, 0 is the board number and J3201 is the SIMM reference number
CPUUses one argument, the CPU number
QLGCUses one argument, the instance number; for more
information, see the isp man page
PLNUses one argument, the instance number; for more
information, see the pln man page
SOCUses one argument, the instance number; for more
information, see the soc man page
Examples of the crnode function are: crnode DISK sd0 and crnode SIMM 0 J3201.

debug

The debug function provides output to a person who is writing rules to explain what is happening to a rule. This function is analogous to puts in Tcl.
The syntax is as follows in which msg is a text string that is placed in debug and must appear in double quotes (" ").

  debug "msg"  

The debug function writes debug messages to: /var/opt/SUNWsymon/<monitored_machine>/EG/eg_debug.<pid>

dynlink

The dynlink function takes a shared object file (.so) and procedure name and dynamically links the shared object and calls the specific procedure. The syntax of the dynlink function is:

  dynlink file name_of_function  

  • file is the full path of the shared object (typically ending in .so).
  • name_of_function is the function name within the dynamically loadable object.
An example of the dynlink function is:

  dynlink /opt/SUNWsymon/lib/eg_pfa.so pfainit  

  • /opt/SUNWsymon/lib/eg_pfa.so is the file name.
  • pfainit is the function name. Only pfainit is needed to do the initialization.

end_alarm

The end_alarm function has no arguments. Use end_alarm to close an open alarm condition. An example of this function is:

  end_alarm  

findlist

The findlist function takes a hierarchy path in which wildcards replace nodes, and replaces it with a list of valid paths in the hierarchy that match the wildcard. It builds the list of matching nodes and returns the most recent value associated with the path, such as set foo $value.
The syntax is:

  findlist hierarchy_path "strip_string"  

An example of a hierarchy_path is KernelReader.cpu.cpu1.busy. The hierarchy path uses the period ( . ) as a path separator.
"strip_string" (strip string means remove or strip out) is any subpath that is on the full path. Enclosing an item in the "strip_string" deletes that item from the expanded path (see FIGURE 4-1). If the strip string is null, you must use double quotes. If the strip string is not null, you don't need to enclose the strip string in double quotes.
KernelReader
cpu
cpu1...cpu2
cpu3
busy...busy
busy
FIGURE 4-1 KernelReader.cpu.*.busy Hierarchy Example
For example, when you invoke the findlist function, the following function:

  findlist KernelReader.cpu.*.busy ""  

is expanded to:

  KernelReader.cpu.cpu1.busy  
  KernelReader.cpu.cpu2.busy  
  KernelReader.cpu.cpu3.busy  

Another example of the findlist function is:

  findlist KernelReader.cpu.*.busy ".busy"  

which expands to:

  KernelReader.cpu.cpu1  
  KernelReader.cpu.cpu2  
  KernelReader.cpu.cpu3  

Note that .busy is not in the expanded paths because in the findlist function, .busy was enclosed in double quotes in the strip_string. By placing .busy in the strip_string, this deletes .busy from the expanded path.
Use the findlist function only in a Tcl script related to the MULTI attribute. The entire list that is created is kept internal to the Event Generator and is not available all at once to the script.
Each member of the list is available to the script, one at a time. The action is called once for each node-value pair. If this were the rule:

  MULTI {command}  
  {command2 }  

It could look like this in the C shell:

  foreach i ( 'command' )  
  command 2  

For an example of the findlist function, see Rule 10000 later in this chapter.

Implicit Instance Matching

The findlist and hotlist functions have the implicit instance matching feature. This feature allows any node entry in a findlist or hotlist hierarchy path that is not a wildcard entry and does not contain an instance number to match nodes that have the same name but have an instance number. An example of an entry that meets these criteria is: system.slot(0).board(0).cpu. Note that the node cpu in the previous path is not a wildcard entry and it does not contain an instance number, such as cpu(1).
For example, a single rule for: system.*.*.cpu.temperature matches all CPU nodes with temperature properties. The string system.slot.board.cpu.temperature also matches all CPU nodes with temperatures properties since you can have several instance matches in the findlist or hotlist path.

Multilevel Wildcard String Support

The findlist snd hotlist functions also have multilevel wildcard string support, "**". This feature was first introduced in Solstice SyMON 1.4. Multilevel wildcard string support allows a single rule to match nodes at different levels of the hierarchy. The string ** matches any number of nodes.

Note - You must use only one instance of this wildcard in a hierarchy path expression.

As an example:
· To find all sd nodes in a system, use the expression: system.**.sd

findstatus

The findstatus function finds the status of a node. The syntax is as follows in which node is the hierarchy endpoint. Two valid values of node are dead or alive.

  findstatus node  

For an example of a rule that uses the findstatus function, see RULE 201 in CODE EXAMPLE 4-6.

findvalue

The findvalue function takes the name of any data hierarchy variable and returns the value of the variable. The syntax is:

  findvalue node  

node is the hierarchy endpoint.
An example of the findvalue function is:

  findvalue $node  

You cannot specify a time value for the value. The value used depends on the agents providing the data. For an example of a rule using the findvalue function, see RULE 201 in CODE EXAMPLE 4-6.

getfield

The getfield function returns the internal values associated with a rule or a rule-node combination (list from a MULTI string). It accesses publicly available fields of data from within rules.
The syntax is:

  getfield [rule#] field_type  

The optional variable rule# is the rule number, which is assigned when you write the rule. For example, 0, 1, 2, and so on.
The field_types are listed in TABLE 4-8:
TABLE 4-8 getfield
Field TypeDescription
PARAMETERSUser-defined string associated with the rule; use this field type to pass data between invocations of the rule
RATEFrequency of the execution of the rule (seconds)
ACTIVEReturns a value of either true or false; ACTIVE equals true when the event is active or open (when the ON_OPEN section of the rule is executed but the ON_CLOSE section is not executed;) the event is closed or inactive once the ON_CLOSE is executed
COUNTReturns the number of consecutive iterations of the rule (read only)
PRIORITYReturns the priority value of the rule (see TABLE 4-1)
TABLE 4-8 getfield
Field TypeDescription
SEVERITYReturns the severity value of the rule (see TABLE 4-1)
RULEReturns the rule number (read only)
START_TIMEReturns the time the rule became active (read only)
An example of the getfield function is:

  getfield COUNT  

The getfield COUNT function returns a count of the number of consecutive
iterations where the condition of the rule has been true for a non-MULTI rule. This
function also returns the number of consecutive iterations where the condition of a
rule has been true for the specified rule or node combination for a MULTI rule. In a
MULTI rule, the rule is evaluated for more than one node. If the rule number, which
is optional, is not specified (as in the example), the rule number you are presently in
is used.
For an explanation of the converse function to getfield, see "putfield" later in
this chapter.

CAUTION Caution - If you are using get_parameter and/or put_parameter, do not use
putfield or getfield with the PARAMETERS field type on the same rule. Your
data may become corrupt.

get_parameter

The get_parameter function takes a single argument and gets a value that is
unique to the rule from the PARAMETERS attribute that is associated with that
argument. Rules can pass values to each other. A parameter is a way to pass the
value to a different rule. Parameters are placeholders for values that are used at a
later time.
The get_parameter function manages a parameter list. One use of
get_parameter is for MULTI rules that require historical records.
The syntax is:

  get_parameter argument  

  • argument is an identification string (without spaces) where a particular value is stored in memory.
You initialize the get_parameter function by using the PARAMETERS attribute to set the initial default value. For example, PARAMETERS {default 1000}. If the get_parameter argument does not have a value currently associated with it, the default value is returned.
An example of the get_parameter function follows:

  {  
       set oldtempdegree [ get_parameter $node ]  
  .  
  .  
  .  

For a description of a rule containing the PARAMETERS attribute and the get_parameter function, see CODE EXAMPLE 4-7.
Additionally, see the put_parameter function in "put_parameter" later in this chapter which is the compliment to get_parameter.

gettime

The gettime function returns the last sample time of the monitored machine as a long integer. There are no arguments. The syntax is:

  gettime  

hotlist

The hotlist function works like findlist except that it returns only the nodes whose values changed. Do not use the hotlist function in cprules, pfrules, or rules that contain ON_CONTINUE in them. That is because in these circumstances you want every node to be returned. In these instances, use the findlist function instead.
The following is a partial code example for RULE 1201 The complete rule is listed
later in this chapter in CODE EXAMPLE 4-6.

  {  
    COMMENTS { for hot plug charge DC status }  
    RULE 1201  
    MULTI {  
      expr { [ hotlist system.hot_plug_charges.*.status "" ] } }  
  }  
      set hpustatus  [ findstatus $node ]  
      expr { ("$value" != "OK") && ("$hpustatus" == "alive") }  
  }  
  .  
  .  
  .  


Note - If any component of a hierarchy path contains a period ( . ) in it, you may not be able to use the hotlist function to locate it. For example, if you search for the mount point /export/a0.test reported by the Config Reader via the following path:
system.slot.board.io-unit.sbi.dma.esp.sd./export/a0.test
the node will not be found.  Instead, use the following path to find this node:
system.slot.board.io-unit.sbi.dma.esp.sd.* .


For more information on hotlist, refer to "findlist."
There are two new features for both findlist and hotlist in Solstice SyMON 1.4. They are:
  • Implicit instance matching
  • Multilevel wildcard string "**" support
To learn how to use these new features, refer to "Implicit Instance Matching" and "Multilevel Wildcard String Support" in "findlist."

load

The load function, which is specific to the Log Scanner, takes more than one argument to create a Log Scanner hierarchy that is checked as part of the condition of the rule. An example is:

  LOG_RULES { { grep { Ecc error on board ([0-9]), reference number (J[0-9]) } syslog }  
                            { load LR10000 "$logword(1)" "$logword(2)"} }  

For a complete listing of this code example, see "Partial Listing of RULE 10000 and Description" later in this chapter.
The load command has three arguments:
  • Rule number LR10000
· $logword(1)
· $logword(2)

$logword(1) is substituted for the value that matches the first regular expression between parentheses ( ); in this case, the board number $logword(2) is substituted for the value that matches the second regular expression enclosed between parentheses ( ), which in this case is the reference number. For example, the following line in the syslog file:
"Ecc error on board 5, reference number J3200" generates the following hierarchy: LogScanner.LR10000.5.J3200.
For more information on how to use the load command, see "Log Scanner" later in this chapter.

mailto

The mailto function mails a message string to the specified address. The syntax of the function is:

  mailto address "msg_string"  

If there are spaces in msg_string, include msg_string in double quotes (" "). All existing rule messages are in rultext*.tcl, to facilitate translation into other languages.
An example of the mailto function is:

  mailto root "$mess"  

Another example of the mailto function is:

  mailto root "We have a problem."  

putfield

The putfield function is the converse of getfield. It uses a field type and data and assigns the data to the field of the current rule.
The syntax is:

  putfield [rule#] field_type "value"  

The optional variable rule# is the rule number that is assigned when you write the rule. For example, 0, 1, 2, 3, and so on.
The field_types are described in TABLE 4-9.
TABLE 4-9 putfield
Field TypeDescription
PARAMETERSSets the user-defined string associated with the rule. Use this field type to pass data between invocations of the rule.
RATESets the frequency of the execution of the rule (seconds).
PRIORITYSets the priority value of the rule. For meanings of the values of PRIORITY, see "Terminology" earlier in this chapter.
SEVERITYSets the severity value of the rule. For meanings of the values of SEVERITY, see "Terminology" earlier in this chapter.
If there are spaces in value, use double quotes (" ").
An example of the putfield function is:

  putfield 1 RATE 50  

This example changes the frequency with which RULE 1 is checked. RULE 1 is checked at 50 second intervals.
For an explanation of the converse function to putfield, see "getfield."

put_parameter

The syntax of the put_parameter function is:

  put_parameter tag value  

The put_parameter function manages a parameter list. It provides a way for rules to pass values to each other (or between iterations of the same rule).
One use of the put_parameter function is for MULTI rules that require historical records. It saves a part of the data that is associated with a tag.
The arguments have the following meanings:
  • tag is the key that is used to store data; the legal values can be any string.
  • value is the data to be stored.
You initialize the put_parameter function by using the PARAMETERS attribute to set the initial default value. For example, PARAMETERS {default 1000}.
An example of the put_parameter function follows:

  put_parameter [$node] RED  

Using the $node parameter is optional. Leaving out $node makes the put_parameter function more efficient. For a complete code example that includes the put_parameter function, see "RULE 100 Code Example."
Additionally, see the get_parameter function in "get_parameter,"which is related to put_parameter.

snmp

The snmp function takes a string as an argument and generates an SNMP trap. A trap message is sent to every machine in the snmp_hosts variable, which is defined in event_gen.tcl. The syntax is as follows in which string is the message sent to programs such as Solstice Site Manager and Solstice Domain Manager, which monitors the system:

  snmp "string"  

An example follows:

  snmp  "$imsg"  

Messages can either appear explicitly in snmp or the identifier of a message in rultext.tcl can be given. All existing rule messages are in rultext.tcl, which facilitates translation into other languages. The snmp function goes to the rultext.tcl file, picks up the variables from rultext.tcl, runs it through the format function, and then displays the messages (see CODE EXAMPLE 4-3).

syslog

The syslog function takes a string as an argument and places the string in the syslog. Use this function for debugging purposes only. Use this function as you use the printf function in C. The syntax is as follows in which msg is a text string that is placed in syslog and must be in double quotes (" "). All existing event messages are in rultext.tcl, which can be translated into other languages.

  syslog "msg"  

An example of the syslog function is:

  syslog "Show me the value of $variablename"  

This example prints out the message and gives the value of the variable name.
Output from the syslog function is treated as warning messages. The message level for the syslog call is LOG_WARNING. To be sure that the warning messages appear in the system log file, make sure that the /etc/syslog.conf file is set correctly. For more information, see the syslog.conf man page.

Hierarchies

The next part of this chapter explains how hierarchies work and how to write event rules.
Hierarchies are used to structure data and group-related information. Each node and subnode organize the data beneath it. For example, in this hierarchy, the following is true:

  KernelReader.cpu.cpu1.busy  

  • Top level, KernelReader, indicates that this is KernelReader data.
  • KernelReader.cpu indicates that this is cpu data.
  • KernelReader.cpu.cpu1 indicates that this data is related to cpu1.
  • KernelReader.cpu.cpu1.busy is the percentage of time that the cpu1 was busy.
Hierarchies use the period ( . ) as a separator for hierarchy paths. For specifics on how to use the findlist function for locating hierarchy paths, see "findlist" earlier in this chapter. The findlist function cannot find the hierarchy path if any component of the hierarchy path contains a period.
For more information on the Kernel Reader see "Kernel Reader" later in this chapter. For information on the Kernel Reader hierarchy, see Appendix A, "Kernel Reader."

Writing Event Rules

This section provides procedures for writing event rules and explains how to create rules from any of the four agents:
  • Config Reader
  • Kernel Reader
  • Log Scanner
  • Event Generator

· To Create New or Modified Rules

  1. As root or superuser, use a text editor to call up the appropriate rules files to edit (see TABLE 4-3.)

    For example, if you need to add a capacity planning rule, call the cprules.tcl file and add the new rule to this file. Go on to Step 3.

  2. If a rules file does not exist for your rule category, create a new rules file using a text editor and add the rule to this file.

    a. Modify the master file rules.tcl after you create the new rule by adding a psource command to tell the rules.tcl file to read the new rule file. See CODE EXAMPLE 4-1</> for information on psource in "Description of Rules Files" earlier in this chapter.

    b. Add the new rule file name to Tcl variable Rules in the set Rules "file_names" statement.

    Note that within the quotes, the file names should be separated by spaces.

  3. Verify the rule.

    See "Verifying New Event Rules" later in this chapter.

  4. Activate the rule.

    See "Activating New or Modified Event Rules" later in this chapter.

    The following sections describe the four agents in more detail, explain how to determine the path for the Config Reader and Kernel Reader hierarchies, explain the LOG_RULES that are used for the Log Scanner, and describe the functions of the Event Generator.

Config Reader

The Config Reader collects the data for the system hardware configuration and status and provides data continuously to the Event Generator. Rules written against the Config Reader describe hardware-related failures, such as power supply or fan failures. The Config Reader data begins with system.<complete_path>. An example is: system.hot_plug_charges.auxillary_5v.status.
The following screen shows a partial display of Rule 1201, which describes hot plug charge DC status. The complete rule is presented in "Complex Rule Example 2: RULE 1201" later in this chapter.

  {  
    COMMENTS { for hot plug charge DC status }  
    RULE 1201  
    MULTI {  
      expr { [ hotlist system.hot_plug_charges.*.status "" ] } }  
  }  
      set hpustatus  [ findstatus $node ]  
      expr { ("$value" != "OK") && ("$hpustatus" == "alive") }  
  }  
  .  
  .  
  .  

The MULTI label indicates that this rule might apply to more than one node. Each node found using the hotlist function matches the pattern system.hot_plug_charges.*.status. For example:
· system.hot_plug_charges.auxiliary_5v.status
· system.hot_plug_charges.peripheral_12v.status
· system.hot_plug_charges.peripheral_5v.status

Since the strip string is null (empty double quotes), the value assigned to the $node variable after the evaluation does not change.
The statements in the condition do the following:
  • Obtains the status of the node variable and assigns it to the hpustatus variable
  • Checks if the value of the value variable is not equal to "OK" and the value of the variable hotplugstatus is equal to "alive"

Determining the Path for the Config Reader

To write rules for the Config Reader, use the Logical View console to determine the Config Reader path.

· To Determine the Path for the Config Reader

  1. Click the Logical View icon.

  2. Expand the hierarchy by clicking the + signs in the hierarchy.

    FIGURE 4-2 shows an example of the owey.* hierarchy. The expanded path is as follows:


  owey.hot_plug_charges.peripheral_5v.peripheral_5v_precharge.status  

图形

FIGURE 4-2 ConfigReader

  1. Replace owey with system.

    The path is now:


  system.hot_plug_charges.peripheral_5v.peripheral_5v_precharge.status  

For information on the Config Reader hierarchy, see Appendix B, "Config Reader."

Kernel Reader

The Kernel Reader monitors system performance data such as disk queue length, forks, and CPU load. Rules written against the Kernel Reader data test Kernel Reader data. The Kernel Reader, like the Config Reader, provides data continuously to the Event Generator.
Kernel Reader data path names begin with KernelReader. CODE EXAMPLE 4-2 shows a partial display of RULE 100, which uses a MULTI attribute to define a Kernel Reader path. The complete rule is presented later in "Complex Rule Example 3: RULE 100" later in this chapter.

  {  
      RULE 100  
      COMMENTS {  
              This rule generates a YELLOW alarm if it finds any  
              disk with an increasing wait queue while busy.  
  .  
  .  
  .  
              }  
  .  
  .  
  .  
  MULTI { expr { [ hotlist KernelReader.disk.*.busy .busy ] } }  
  .  
  .  
  .  

CODE EXAMPLE 4-2 Partial Rule 100 Display
The MULTI statement tells the Event Generator to repeat the same rule on each node that is returned from the hotlist function. The hotlist function finds the node whose name matches KernelReader.disk.*.busy. Since the strip string is defined as .busy (because the strip string is not null, it has to be in double quotes), the node name that is returned in the node variable by hotlist function contains only the matched node name, which excludes .busy. For example, if the hotlist function matches a node named KernelReader.disk.sd8.busy, the $node variable contains KernelReader.disk.sd8.

Determining the Path for the Kernel Reader

To write rules for the Kernel Reader, determine the Kernel Reader path.

· To Determine the Path for the Kernel Reader

  1. Double click the Kernel Data Catalog.

  2. Expand the hierarchy by clicking the + signs in the hierarchy.

    FIGURE 4-3 shows an example of the owey.disk hierarchy.

  3. Replace the machine name, owey, with KernelReader.

    An example of a path is KernelReader.disk.ssd93.runtime.

图形

FIGURE 4-3

For information on the Kernel Reader hierarchy, see Appendix A, "Kernel Reader."

Log Scanner

Event rules can search the system log files for specific strings or regular expressions. The LOG_RULES attribute specifies the desired string. The Log Scanner parses the system log file and converts any matching message to a hierarchy that is similar to hierarchies generated by the Kernel Reader and Config Reader.
Rules written against the Log Scanner describe the following type of conditions and messages:
  • Panic conditions (such as a system dying)
  • Conditions that require immediate attention, such as a corrupted database
  • Critical conditions such as hard disk errors
  • Errors and warning messages
  • Conditions that require special handling (these conditions are user-defined)
  • Informational messages (such as hot plug information)
  • Messages that contain information normally used when debugging a program
For more information on these conditions and warnings, see the syslog(3) man page.

Event Generator and Log Scanner Tasks

The Event Generator and the Log Scanner work together to perform the following tasks:
  • The Event Generator reads the rules. When it finds the LOG_RULES statement, it places the rules on its hierarchy. The Log Scanner reads this hierarchy and performs actions.
  • The Log Scanner goes through the system log files and uses the pattern supplied in LOG_RULES to find a match in the log file. If found, it constructs another hierarchy beginning with LogScanner.<completepath>, which is similar to the Config Reader and Kernel Reader hierarchies. By convention, the complete path is LogScanner.LR<rule#>.<what is dynamically assigned by the Log Scanner>. It obtains the complete path by reading this hierarchy as explained earlier. The LS_MONITOR statement in the /etc/opt/SUNWsymon/log_scan.tcl file identifies the file that contains the system log for the Log Scanner.
  • The Event Generator executes the rules.

Partial Listing of RULE 10000 and Description

The following shows RULE 10000, an oversimplified rule that describes memory errors. This rule is oversimplified to present the following concepts:

  {  
       COMMENTS { memory error }  
       RULE 10000  
         LOG_RULES { { grep { Ecc error on board ([0-9]), reference number (J[0-9]) } syslog }  
                     { load LR10000 "$logword(1)" "$logword(2)"} }  
         MULTI { expr { [ findlist LogScanner.LR10000.* "" ] } }  
       {  
        set found 1  
       }  

The LOG_RULES statement tells the Log Scanner what message to look for, and what to do if the message is found in the syslog. In this example, the Log Scanner looks for a system message that matches the regular expression defined in a pair of brackets { } followed by in syslog.
If a system message that matches the regular expression is found, the load command (which is specific to the Log Scanner) is called to construct a hierarchy that begins with LogScanner. The load command has three arguments:
  • Rule number LR10000
· $logword(1)
· $logword(2)

$logword(1) is substituted for the value that matches the first regular expression enclosed between parentheses ( ), in this case, the board number. $logword(2) is substituted for the value that matches the second regular expression enclosed between parentheses ( ), in this case, the reference number. For example, the following line in the syslog file: "Ecc error on board 5, reference number J6" generates the following hierarchy: LogScanner.LR10000.5.J6.
The Event Generator executes the findlist function in the MULTI statement searches and returns the list of hierarchies or nodes that match LogScanner.LR10000. The next statement assigns the value 1 to the found variable.

Guidelines for Using LOG_RULES

When using LOG_RULES follow these guidelines:
  • The load function takes more than one argument to create a Log Scanner hierarchy that is checked as part of the condition of the rule.
  • LOG_RULES are always MULTI rules. For example:

  MULTI {expr {[findlist LogScanner.* ""]}}  

  • To get data loaded by the Log Scanner, use findvalue $node.
  • To parse the logword data, use the split and lindex Tcl functions. The logword data is created by specifying subexpressions in the regular expression. These subexpressions are loaded by the load function on a Log Scanner property. Use the split and lindex Tcl functions in the Event Generator action to parse the string data on that Log Scanner property.
For a complete listing of a rule for the Log Scanner, see CODE EXAMPLE 4-7.

Event Generator

The Event Generator collects information from the Config Reader, Kernel Reader, and Log Scanner, and evaluates the data against its rules. It also maintains data about itself and the state of the server agents. This allows rules to be written against the status of agents (including itself) as well as data from the server. One example of a rule that is written against the Event Generator would be a warning of a high number of open events.
When a condition is true, the Event Generator logs an event and carries out the appropriate action in the rule. When the condition that generated the event is no longer true, the Event Generator may run a special action such as closing the event.

Getting Functions

The Event Generator gets functions from the /opt/SUNWsymon/etc/event_gen.tcl file. The sm_confsymon -e command copies the /opt/SUNWsymon/etc/event_gen.tclfile to the /etc/opt/SUNWsymon/event_gen.servername.tcl file, where servername is the server being monitored.
The event_gen.servername.tcl file defines variables and procedures used by the Event Generator. For example, the psource procedure is defined in event_gen.servername.tcl and is used in rules.tcl.
The Event Generator looks first in /etc/opt/SUNWsymon for rules files. If it cannot find the files, it then looks in /opt/SUNWsymon.

· To Customize Procedures and Commands

· Edit the event_gen.servername.tcl file.
Editing the event_gen.servername.tcl enables you to include your own
procedures and commands, such as adding the psource procedure. See
CODE EXAMPLE 4-1 for information on psource in "Description of Rules Files" earlier
in this chapter.

CAUTION Caution - Do not modify defaults of procedures and commands unless you are
extremely sure of your Tcl knowledge. Modifying the defaults incorrectly can cause
massive errors or malfunctioning rules

Locating Rule Files

The Event Generator reads a file called rules.tcl. This file tells what files to read
and where the rules files are located. For more information on this file, see
"Description of Rules Files" earlier in this chapter.
The event_gen.servername.tcl file defines the path used to search for the rule
files. By default, the paths are:
· /etc/opt/SUNWsymon
· /opt/SUNWsymon/etc
· /opt/SUNWsymon/etc/lib
· /etc/opt/SUNWsymon/platform
· /opt/SUNWsymon/etc/platform
· /opt/SUNWsymon/etc/locale


Verifying and Activating Rules

This section describes how to verify and activate rules. After you write or modify
the event rule, you need to verify it then activate it.

Verifying New Event Rules

· To check for the correct syntax, invoke the verify_rules command by typing:

  $ verify_rules [-I filename] [-R filename] [-o]  

The command has three optional arguments (see TABLE 4-10.)
TABLE 4-10 verify_rules
ArgumentDescription
-IChecks the file that contains supporting functions; the default is event_gen.tcl
-RChecks the file that contains the EVENTS variable; the default is ./rules.tcl
-oProvides debugging information on rules to the standard error output
When you run verify_rules and the rules are correct, the software responds, "GOOD RULES." If a syntax error is detected, the program responds, "BAD RULES." However, this does not guarantee that the rule will work as expected.

Activating New or Modified Event Rules

You can change rules by stopping and restarting the Event Generator or sending a signal to the Event Generator process.

· To Activate New or Modified Event Rules

To Stop and Restart the Event Generator:
  1. Kill the Event Generator: by typing:


  % ps -ef | grep sm_egd  
  % kill pid  

a. Type the pid number for the kill command that was returned from the ps -ef | grep sm_egd command.
  1. Restart the Event Generator.

    By sending a signal:

· Activate the new or modified rule by typing:

  % kill -HUP pid  

This causes the Event Generator to re-read and execute all rules files.

Debugging Tips

Use the following debugging tips to debug event rules:
  • Run verify_rules to make sure the syntax is accurate. See "Verifying and Activating Rules" earlier in this chapter.
  • If you have a problem with the rules, change the rules.tcl file to test only one rule at a time.
  • Look for error messages in the event log file: /var/opt/SUNWsymon/monitored_machine_name/event_log.
  • Use the debug function to do debugging. For information on the debug function see "debug" earlier in this chapter.

Troubleshooting

If you have problems with event rules, such as when the Event Generator fails to start properly, you may have to upgrade to Solstice SyMON 1.1 event rules. For information on troubleshooting event rules problems, see Chapter 6, "Troubleshooting."

Rule Examples

This section describes rule examples. The rule description follows the Tcl code example.

Simple Rules

The sections that follow present Tcl code and a description of a simple rule.

Simple Rule Example 1: RULE 0


  {  
            RULE 0  
            COMMENTS {  
                   This rule generates an event if the symond  
                   process on the server machine goes down,as  
                   reported by symond on the Event Generator's  
                   machine to the Event Generator.  
  
                   symond_status is a Tcl variable provided by  
                   the Event Generator,and is either "alive"  
                   or "dead". It is changed when the Event  
                   Generator gets a callback from symond.  
                   }  
            { expr { "$symond_status" == "dead" } }  
            ON_OPEN { alarm RED "" "$r0mess" "" }  
                   set imsg [ format "$ir0msg" "$target" ]  
                   snmp  "$imsg" }  
            ON_CLOSE { end_alarm }  
            SEVERITY 1  
            PRIORITY 1  
  }  

CODE EXAMPLE 4-3 RULE 0 Code Example
RULE 0 checks whether the value of variable symond_status is equal to "dead." If so, ON_OPEN does the following:
  • Calls the alarm function with level RED, null node name, message $r0mess, which is defined in rultext.tcl, and a null Tcl function to the GUI (empty double quotes).
  • Assigns the first formatted message format "$ir0msg", which is defined in rultext.tcl and its target, "$target" to the imsg variable.
  • Takes the string imsg as an argument and generates SNMP traps on every machine in the snmp_host variable, which is defined in event_gen.tcl. The string imsg is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
If the condition changes from True to False, ON_CLOSE calls the end_alarm function. For an explanation of SEVERITY and PRIORITY, see TABLE 4-1 earlier in this chapter.

Simple Rule Example 2: RULE 403


  {  
          RULE 403  
          C { Test for meta CPU events }  
          {  
                  set level [ check_cpa_cpu ]  
                  expr { "$level" == "BLUE" }  
          }  
          ON_OPEN { alarm BLUE KernelReader.cpu "$r403mess" ""  
                    set imsg [ format "$ir403msg" "$target" ]  
                    snmp  "$imsg" }  
          ON_CLOSE { end_alarm }  
          RATE 3600  
  }  

CODE EXAMPLE 4-4 RULE 403 Code Example
RULE 403,which tests for meta CPU events, assigns the returned value of the check_cpa_cpu function to the level variable and checks whether the value of variable level is equal to "BLUE."
The check_cpa_cpu function checks the cpu cpa-related data that is collected from another rule, compares it with the predefined threshold and then returns a capacity panning level. If there is a capacity concern, the variable level returned is "BLUE." ON_OPEN does the following if the variable level is equal to "BLUE":
  • Calls the alarm function with level BLUE, node name KernelReader.cpu, message $r403mess, which is defined in rultext.tcl, and a null Tcl function to the GUI (empty double quotes).
  • Formats a message with "$ir403msg" ( defined in rultext.tcl) as the base and "$target" as its first argument. It stores the formatted message in the variable imsg.
  • Broadcasts an SNMP message, which is defined in the variable imsg, to machines defined in snmp_host variable (defined in event_gen.tcl). The string imsg is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
If the condition changes from True to False, ON_CLOSE calls the end_alarm function. The statement RATE 3600 means that the rule is evaluated every 3600 seconds.

Complex Rules

This section contains examples and explanations of complex rules.

Complex Rule Example 1: RULE 102


  {  
      RULE 102  
      COMMENTS {  
           This generates a YELLOW event if 90% of the swap  
           space is in use. Event stays open until swap space  
           in use is less 80% of the swap space.  
           }  
     MULTI expr { [hotlist KernelReader.mem.swap_free ] }}  
      {  
           if { [getfield ACTIVE] } { set ts [ expr 0.20 * [ findvalue \  
                                   KernelReader.mem.swap_total ] ] } \  
                   else { set ts [ expr 0.10 * [ findvalue \  
                                   KernelReader.mem.swap_total ] ] }  
           expr { $value < $ts}  
       }  
       ON_OPEN { alarm YELLOW KernelReader.mem.swap_free\  
           "$r102mess" "" }  
           set imsg [ format "$ir102msg" "target" ]  
           snmp "$imsg"  }  
       ON_CLOSE { end_alarm }  
       SEVERITY 3  
       PRIORITY 3  
  }  

CODE EXAMPLE 4-5 RULE 102 Code Example
The MULTI attribute checks the current value of free swap space by getting the value of KernelReader.mem.swap_free using the hotlist function.
There are two statements in the condition for this rule:
  • The first statement checks whether this rule is currently active by getting the value of the ACTIVE field with the getfield function. If this rule is active, it gets the value of total swap space (KernelReader.mem.swap_total) using the findvalue function; it then calculates twenty percent of the total swap space and assigns the value to the ts variable. If the rule is not active, it gets the value of the total swap space as in the previous statement and then calculates ten percent of the total swap space and assigns the value to the ts variable.
  • The second statement compares the value ($value) with the threshold and determines the current condition of this rule.
If the condition changes from False to True, ON_OPEN does the following:
  • Calls the alarm function with level YELLOW, node name KernelReader.mem.swap_free, message $r102mess (defined in rultext.tcl), and no Tcl function to the GUI (empty quotes).
  • Assigns the first formatted message format "$ir0msg" that is defined in rultext.tcl and its target, "$target" to the imsg variable.
  • Takes the string imsg as an argument and generates SNMP traps on every machine in the snmp_host variable, which is defined in event_gen.tcl. The imsg string is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
If the condition changes from True to False, ON_CLOSE calls the end_alarm function.
For an explanation of SEVERITY and PRIORITY, see TABLE 4-1 earlier in this chapter.
The GUI highlights the node in the Physical and Logical Views, or the Kernel Data Catalog, because a node name is passed to the alarm function.
The sections that follow present Tcl code and descriptions of three complex rules.

Complex Rule Example 2: RULE 1201


  {  
    COMMENTS { for hot plug charge DC status }  
    RULE 1201  
    MULTI {  
      expr { [ hotlist system.hot_plug_charges.*.status "" ] } }  
    {  
      set hpustatus  [ findstatus $node ]  
      expr { ("$value" != "OK") && ("$hpustatus" == "alive") }  
    }  
    ON_OPEN {  
      set mess [ format "$r1201mess" "$value" ]  
      alarm RED $node "$mess" ""  
       set imsg [ format "$ir1201msg" "$target" "$value" ]  
       snmp  "$imsg"  
      }  
    ON_CLOSE { end_alarm }  
    SEVERITY 2  
    PRIORITY 2  
  }  

CODE EXAMPLE 4-6 RULE 1201 Code Example
The MULTI label indicates that this rule might apply to more than one node. Each node found using the hotlist function matches the pattern system.hot_plug_charges.*.status. For example:
· system.hot_plug_charges.auxiliary_5v.status
· system.hot_plug_charges.peripheral_12v.status
· system.hot_plug_charges.peripheral_5v.status

Since the strip string is null (empty double quotes), the value assigned to the $node variable after the evaluation doesn't change.
The statements in the condition do the following:
  • Obtains the status of the node variable and assigns it to the hpustatus variable.
  • Checks if the value of the value variable is not equal to "OK" and the value of the variable hotplugstatus is equal to "alive".
If the condition changes from False to True, the following actions in ON_OPEN take place:
  • Assigns the first formatted message $r1201mess (defined in rultext.tcl) to the mess variable with its argument, $value.
  • Calls the alarm function with the RED level, the $node node name, the formatted $mess message, and no Tcl function to be executed (empty double quotes).
  • Assigns the first formatted message format "$ir0msg", defined in rultext.tcl, and its target, "$target," with its $value argument to the imsg variable.
  • Takes the string imsg as an argument and generates SNMP traps on every machine in the snmp_host variable, which is defined in event_gen.tcl. The imsg string is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
If the condition changes from True to False, ON_CLOSE calls the end_alarm function.
For an explanation of SEVERITY and PRIORITY, see TABLE 4-1 earlier in this chapter.
The process is repeated until all nodes are found using the findlist function in the MULTI statement.

Complex Rule Example 3: RULE 100


  {  
      RULE 100  
      COMMENTS {  
              This rule generates a YELLOW alarm if it finds any  
              disk with an increasing wait queue while busy.  
  
              YELLOW flag when disk is over 75% busy and average  
              queue length is over 10 and increasing wait queue.  
              Flag stays on till disk is not over 70 and average  
              queue length is no longer than 8.  
  
              This is a transitory event.  
              }  
      {  
              set oldwait [ get_parameter]  
              set projwait [ expr $oldwait*1.25 ]  
              set diskwait [ findvalue $node.queuelength ]  
              put_parameter $diskwait  
              if { [getfield ACTIVE] } { set th1 0.7; set th2 8 } \  
                          else     { set th1 0.75; set th2 10}  
              expr { (($value>$th1)&&($diskwait>$th2)&& \  
                     ($diskwait>$projwait)) }  
      }  
      ON_OPEN { alarm YELLOW $node.busy "$r100mess" ""  
          set imsg [ format "$ir100msg" "$target" "$node" ]  
          snmp  "$imsg" }  
      MULTI { expr { [ hotlist KernelReader.disk.*.busy .busy ] } }  
      ON_CLOSE {end_alarm }  
      PARAMETERS { default 1000 }  
      SEVERITY 3  
      PRIORITY 2  
  }  

CODE EXAMPLE 4-7 RULE 100 Code Example
The first part of the rule has seven statements:
  • The first statement gets a user-defined value using the get_parameter function. The first time the rule is evaluated, the value is the default value defined after the PARAMETERS value. The oldwait variable (old wait time) stores the value.
  • The second statement multiplies the old wait time, $oldwait, by 1.25 and stores it in the projwait variable (projected wait time).
  • The third statement gets the current disk wait time. It obtains the value of the queuelength field of the node that is currently being evaluated. It stores the value in the diskwait variable.
  • The fourth statement saves the current disk wait time, $diskwait,using the put_parameter function.
  • The fifth and sixth statements check if the rule is currently active (if the condition is currently True) and assign the threshold values to the th1 and th2 variables. They compare the amount of time the node is busy ($diskbusy) with the th1 threshold. Also, they compare the wait time of the node ($diskwait) with the th2 threshold.
If the condition changes from False to True, ON_OPEN does the following:
  • Calls the alarm function with the level YELLOW, the node name with its property name ($node.busy), event message (r100mess), which is defined in rultext.tcl and no Tcl function for the GUI (empty double quotes).
  • Assigns the first formatted message format "$ir000msg", defined in rultext.tcl and its target, "$target," to the imsg variable.
  • Takes the imsg string as an argument and generates SNMP traps on every machine in the snmp_host variable, which is defined in event_gen.tcl. The string imsg is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
The MULTI statement tells the Event Generator to repeat the same rule on each node that is returned from the hotlist function. The hotlist function finds the node whose name matches KernelReader.disk.*.busy.busy. Since the strip string is defined as .busy, (because the strip string is not null, it has to be in double quotes), the node name that is returned in the node variable by the hotlist function contains only the matched node name excluding .busy. For example, if the hotlist function matches a node named KernelReader.disk.sd8.busy.busy, the $node variable contains KernelReader.disk.sd8.
If the condition changes from True to False, ON_CLOSE calls the end_alarm function.
The PARAMETERS statement sets a default value to 1000. Thus, the condition will not be True the first time this rule is evaluated.
For an explanation of SEVERITY and PRIORITY, see TABLE 4-1 earlier in this chapter.

Complex Rule Example 4: RULE 106


  {  
       COMMENTS { no swap space }  
       RULE 106  
       RATE 300  
       LOG_RULES { {grep { no swap space.*pid ([0-9]+)} syslog }  
                   { load LR106 "$logword(1)" } }  
       MULTI { expr { [ hotlist LogScanner.matches.LR106.* "" ] } }  
       {  
                       set found 1  
       }  
       ON_OPEN {  
              set lsdata      [ split $value ]  
              set pid         [ lindex $lsdata 0 ]  
              set mess [ format "$r106mess" "$pid" ]  
              alarm YELLOW "" "$mess" ""  
              end_alarm  
                 set imsg [ format "$ir106msg" "$target" "$pid" ]  
                 snmp "$imsg"  
              }  
       SEVERITY 2  
       PRIORITY 2  
  }  

CODE EXAMPLE 4-8 Rule 106 Code Example
The LOG_RULES statement tells the Log Scanner what message to look for, and what to do if the message is found in the syslog. In this example, the Log Scanner looks for a system message that matches the regular expression, which is defined in a pair of brackets { } followed by in syslog.
If a system message that matches the regular expression is found, the Log Scanner creates a hierarchy named LogScanner.LR106.* The value of LogScanner.matches.LR106.* contains arguments following the load function. The element, $logword(1),contains the string that is matched in the regular expression defined between parentheses. You count the pairs of parentheses from left to right. In this example, it is the first pair of parentheses whose matched string is stored in $logword with an index of 1, $logword(1). If there were a second pair of parentheses, you would find the matched information between the second pair of parentheses to the second string.
The hotlist function in the MULTI statement searches and returns the list of hierarchies or nodes created by the Log Scanner whose name can be matched with LogScanner.matches.LR106.* The condition has one statement, which assigns the value 1 to the found variable.
Next, the following seven actions take place in ON_OPEN.
  • Splits the list into elements and stores them in the lsdata variable. For more information on the split function, refer to the Tcl books listed in the Preface.
  • Obtains the process ID number from the first list element (index 0) and stores it in the pid variable.
  • Formats the event message and stores it in the mess variable. This is done by passing the message body $r106mess and the process ID number $pid. The $r106mess variable is stored in rultext.tcl.
  • Calls the alarm function with YELLOW level, no node name, formatted message $mess, and no Tcl function for the GUI (empty double quotes).
  • Calls end_alarm to close the event.
  • Assigns the first formatted message format "$ir106msg", stored in rultext.tcl, and its target $target plus $pid to the imsg variable.
  • Takes the string imsg as an argument and generates SNMP traps on every machine in the snmp_host variable, which is defined in event_gen.tcl. The string imsg is sent to Solstice Site Manager, Solstice Domain Manager, Solstice Enterprise Manager, or other SNMP-listening management platforms.
For an explanation of SEVERITY and PRIORITY, see TABLE 4-1 earlier in this chapter.