Sun Management Center 3.6.1 User's Guide
只搜尋這本書
查看這本書:
以 PDF 格式下載這本書 (2192 KB)

Appendix D Sun Management Center Software Rules

This appendix lists the Sun Management Center rules for the following modules:

Rules Concepts

A rule is an alarm check mechanism that allows for complex or special purpose logic in determining the status of a monitored host or node.

There are two types of rules:

  • Simple rules are based on the rCompare rule, in which monitored properties are compared to the rule. If the rule condition becomes true, an alarm is generated. For example, a simple rule can be the percentage of disk space used. If the percentage of disk space used is greater than or equal to the percentage specified in the rule, then an alarm is generated.

  • Complex rules are based on multiple conditions. For example, one complex rule states that an alert alarm is generated when the following conditions are met:

    • The disk is over 75% busy

    • The average queue length is over 10

    • The wait queue is increasing


    Note –

    Any user-customized Solstice SyMONTM 1.x rules must be ported to the Sun Management Center environment before the rules can be used in Sun Management Center software.


Kernel Reader

The following table lists the Kernel Reader simple rules.

Table D–1 Kernel Reader Simple Rules

Property

Description

avg_1min

Load average over the last minute

avg_5min

Load average over the last 5 minutes

avg_15min

Load average over the last 15 minutes

cpu_delta

Difference between the previous and current time

cpu_idle

CPU idle time

cpu_kernel

CPU kernel time

cpu_user

CPU user time

cpu_wait

CPU wait time

ipctused

Percent of inodes used

kpctused

Percent of Kbytes used

mem-inuse

Physical memory in use (Mbytes)

numusers

Number of users

numsessions

Number of user sessions

swap_used

Swap used (Kbytes)

wait_io

CPU wait time breakdown

wait_pio

CPU wait time breakdown

wait_swap

CPU wait time breakdown

The following table lists the Kernel Reader complex rules.

Table D–2 Kernel Reader Complex Rules

Rule ID

Description

Type of Alarm

rknrd100

This rule covers a transitory event. The rule generates an alert alarm when the disk is over 75% busy, the average queue length is over 10, and the wait queue is increasing. The alert alarm remains until the disk is less than 70% busy and the average queue length is less than 8.

Alert

rknrd102

This rule covers a transitory event. The rule generates an alert alarm if 90% of swap space is in use. The event causing the alarm remains until swap space in use is less than 80% of the total swap space.

Alert

rknrd103

This rule covers a transitory event. The rule generates an alert alarm if swapping and paging is high for a given CPU. This behavior indicates that a CPU might be thrashing. An alert alarm is generated when CPU exceeds 1 swap-out, 10 page-ins, and 10 page-outs per second. The alert alarm stays on if CPU exceeds 1 swap-out, 8 page-ins, and 8 page-outs per second.

Alert

rknrd105

File System Full error. This rule looks for a file system full error message in the syslog (/var/adm/message).

Alert alarm that is closed immediately

rknrd106

No swap space error. This rule looks for a no swap space error message in the syslog (/var/adm/message).

Alert alarm that is closed immediately

rknrd400

This rule checks for a continuous CPU load over six per CPU for four hours.

Informational

rknrd401

This rule checks for disks that are busy more than 90% of the file for x hours. The parameters field holds the last time CPU load was below six, and is initialized to some date in the year 2001.

Informational

rknrd402

This rule checks if available swap space drops below 10% for x hours. The parameters field indicates the last time that the CPU load was below six. This field is initialized to some date in the year 2001.

Informational

rknrd403

This rule is not currently supported.

Informational

rknrd404

An informational alarm is generated if rule rknrd401 gets triggered 4 times.

Informational

rknrd405

An informational alarm is generated if rule rknrd402 gets triggered 4 times.

Informational

Health Monitor

The following table lists the Health Monitor complex rules.

Table D–3 Health Monitor Complex Rules

Rule ID

Description

Type of Alarm

rhltm000

This rule checks whether there is enough swap space.

Critical, Alert, Caution

rhltm001

CPU power is wasted each time a CPU has to wait for a lock to become free. This event is counted because the kernel uses mutually exclusive locks to synchronize its operation and to keep multiple CPUs from concurrently accessing critical code and data regions.

Critical, Alert, Caution

rhltm002

NFS remote procedure call timeouts may be associated with duplicate responses after the call is retransmitted. These timeouts indicate that the network is okay but the server is responding slowly.

Critical, Alert, Caution

rhltm003

The run queue length is divided by the number of CPUs because every CPU takes a job off the run queue in each time slice.

Critical, Alert, Caution

rhltm004

A busy disk or a slow disk reduces system throughput and increases user response times. This rule identifies the disks that are loaded so that the load can be rebalanced.

Critical, Alert, Caution

rhltm005

RAM rule based on residency time for an unreferenced page. The virtual memory system indicates that the system needs more memory when the system scans to look for idle pages to reclaim for other uses.

Critical, Alert, Caution

rhltm006

This rule refers to the problem with kernel memory allocation that occurs when login attempts or network connections fail unexpectedly. There are two possible causes: Either the kernel has reached the extent of its address space, or the free list does not contain any pages to allocate. The repeated failures signify a problem that might otherwise be overlooked.

Critical, Alert, Caution

rhltm007

A global cache of directory path name components exists. This cache is called the directory name lookup cache (DNLC). If this cache does not exist, directory entries must be read from disk and be scanned to locate the right file.

Critical, Alert, Caution