This chapter describes how you can see information about the hosts constituting
a grid. Information is available about all the hosts performance as well detailed
information about any particular host.
You use the Hosts page both as a means of checking on how efficiently
the hosts's resources are being used and as a way to access more details about
the host itself.
-
Hostname – The name
you assign to this host. Clicking the Hostname displays the very detailed
Host Details page.
-
Arch – The host's
processor architecture like win32-x86 or sol-sparc64. For a complete list
of supported architectures, see the Host
Details page.
-
Average Load/CPU –
Shows how efficiently the Host's CPU is being used. This parameter can be
any positive decimal number but is usually between zero and 2 or 3. Ideally,
this number should be close to 1. A smaller number could mean the host is
under-utilized, and a larger number could mean that the host is overutilized.
The ideal value depends on the workload that is being run. Only the local
administrator can really know the implications of the workload.
-
Used Mem – The percentage
of total memory currently being used to execute jobs. If this value is too
close to the total memory, then the host is possibly in trouble. However,
if the workloads are tuned to fit in the server, then it could be perfectly
fine that the used memory is just under the total memory. In fact, this is
tunable: you can set the value at which the difference between these two parameters
triggers an alarm. So, in one case, a difference of less than 100 MB triggers
a warning, while in another case the value could be set at 25 MB.
-
Total Mem – The total
amount of memory on this host.
-
Free Swap – The amount
of free swap space left on this host measured in MBs. In a well-architected
grid, the free swap space should never drop very far below its initial value.
It is possible that temporary drops in this value can be tolerated, again,
depending on how the grid is architected. If this value goes close to zero,
the host is in danger of failing completely.
The Host Details page contains detailed information about the host system
that is helping to execute a job and is hosting a queue.
-
Hostname – The name
assigned to this host.
-
Arch – An architecture
string compiled into the cod_execd describing the operating
system architecture for which the execd is targeted. Possible values are:
-
sol-sparc for Sun Solaris (Sparc) 7 and higher,
32–bit kernel
-
sol-sparc64 for Sun Solaris (Sparc) 7 and
higher, 64–bit kerne
-
sol-x86 for Sun Solaris (x86) 8 and higher
-
x24-amd64 for Linux 2.4.x (AMD64)
glibc 2.2+ based
-
lx24-x86 for Linux 2.4.x (x86)
glibc 2.2+ based
-
win-x86 for MS-Windows NT
Note –
An sge_execd daemon for a particular architecture
may run on multiple OS versions but the architecture string does not reveal
this level of detail.
-
Num Proc – The number
of processors provided by the execution host. The host is in this case defined
by a single Internet address, for example, rack-mounted multihost systems
are counted as a cluster rather than a single multiheaded machine.
-
Load Avg – The same
as Load Medium.
-
Load Short – The
short time average OS run queue length. This value is the first of the values
triple reported by the uptime command. Many implementations
provide a one minute average.
-
Load Medium – The
medium time average OS run queue length. This value is the second of the values
triple reported by the uptime command. Many implementations
provide a 5 minute average with this value.
-
Load Long — The long
time average OS run queue length. This value is the third of the values triple
reported by the uptime command. Many implementations provide
a 10 or 15 minutes.
-
NP Load Avg – The
same as Load Medium.
-
NP Load Short – The
same as Load Short but divided by the number of processors. This value allows
you to compare the load of single and multiheaded hosts.
-
NP Load Medium –
The same as Load Medium but divided by the number of processors. This value
allows you to compare the load of single and multiheaded hosts.
-
NP Load Long – The
same as Load Long but divided by the number of processors. This value allows
you to compare the load of single and multiheaded hosts.
-
Memory Free – The
amount of free memory.
-
Memory Used – The
amount of used memory.
-
Memory Total – The
total amount of memory (free plus used).
-
Swap Free – The amount
of free swap memory.
-
Swap Used – The amount
of used swap space.
-
Swap Total – The
total amount of swap space (free plus used).
-
Virtual Free – The
sum of Mem Free and Swap Free.
-
Virtual Used – The
sum of Mem Used and SwapUsed.
-
Virtual Total – The
sum of Mem Total and Swap Total.
-
CPU – The percentage
of CPU of cpu busy time when the data was gathered.
-
Date/Time – The timestamp
for when the data was gathered.