内に含まその他のドキュメントサポート リソース | PDF 文書ファイルをダウンロードする (3708 KB)
Chapter 1 Configuring Hosts and ClustersThis chapter provides background information about configuring various aspects of the grid engine system. This chapter includes instructions for the following tasks:
About Hosts and DaemonsGrid engine system hosts are classified into four groups, depending on which daemons are running on the system and on how the hosts are registered at sge_qmaster.
Note – A host can belong to more than one class. The master host is by default an administration host and a submit host. Changing the Master HostBecause the spooling database cannot be located on an NFS-mounted file system, the following procedure requires that the Berkeley DB RPC server be used for spooling. If you configure spooling to a local file system, you must transfer the spooling database to a local file system on the new sge_qmaster host. To change the master host, do the following:
Configuring Shadow Master HostsShadow master hosts are machines in the cluster that can detect a failure of the master daemon and take over its role as master host. When the shadow master daemon detects that the master daemon sge_qmaster has failed abnormally, it starts up a new sge_qmaster on the host where the shadow master daemon is running. Note – If the master daemon is shut down gracefully, the shadow master daemon does not start up. If you want the shadow master daemon to take over after you shut down the master daemon gracefully, remove the lock file that is located in the sge_qmaster spool directory. The default location of this spool directory is sge-root/cell/spool/qmaster. The automatic failover start of a sge_qmaster on a shadow master host takes approximately one minute. Meanwhile, you get an error message whenever a grid engine system command is run. Note – The file sge-root/cell/common/act_qmaster contains the name of the host actually running the sge_qmaster daemon. Shadow Master Host RequirementsTo prepare a host as a shadow master, the following requirements must be met:
As soon as these requirements are met, the shadow-master-host facility is activated for this host. No restart of grid engine system daemons is necessary to activate the feature. Shadow Master Hosts FileThe shadow master host name file, sge-root/cell/common/shadow_masters, contains the following:
The format of the shadow master hostname file is as follows:
The order of the shadow master hosts is significant. The primary master host is the first line in the file. If the primary master host fails to proceed, the shadow master defined in the second line takes over. If this shadow master also fails, the shadow master defined in the third line takes over, and so forth. Starting Shadow Master HostsIn order to start a shadow sge_qmaster, the system must be sure either that the old sge_qmaster has terminated, or that it will terminate without performing actions that interfere with the newly-started shadow sge_qmaster. In very rare circumstances it might be impossible to determine that the old sge_qmaster has terminated or that it will terminate. In such cases, an error message is logged to the messages log file of the sge_shadowds on the shadow master hosts. See Chapter 8, Fine Tuning, Error Messages, and Troubleshooting. Also, any attempts to open a tcp connection to a sge_qmaster daemon permanently fail. If this occurs, make sure that no master daemon is running, and then restart sge_qmaster manually on any of the shadow master machines. See Restarting Daemons From the Command Line. Configuring Shadow Master Hosts Environment VariablesThere are three environment variables which affect the takeover time for a shadow master:
These variables interact in the following way.
A reasonable configuration might be to set the SGE_CHECK_INTERVAL to be 45 seconds and the SGE_GET_ACTIVE_INTERVAL to be 90 seconds. So, after about 2 minutes, the take over will occur. If you want to check the operation of the shadow host after you have configured these environment variables you will have to pull out the master host's network cable to simulate a failure. Configuring HostsN1 Grid Engine 6 software (grid engine software) maintains object lists for all types of hosts except for the master host. The lists of administration host objects and submit host objects indicate whether a host has administrative or submit permission. The execution host objects include other parameters. Among these parameters are the load information that is reported by the sge_execd running on the host, and the load parameter scaling factors that are defined by the administrator. You can configure host objects with QMON or from the command line. QMON provides a set of host configuration dialog boxes that are invoked by clicking the Host Configuration button on the QMON Main Control window. The Host Configuration dialog box has four tabs:
The qconf command provides the command-line interface for managing host objects. Configuring Execution Hosts With QMONBefore you configure an execution host, you must first install the software on the execution host as described in How to Install Execution Hosts in N1 Grid Engine 6 Installation Guide. To configure execution hosts, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab. The Execution Host tab looks like the following figure: Figure 1–1 Execution Host Tab
Note – Administrative or submit commands are allowed from execution hosts only if the execution hosts are also declared to be administration or submit hosts. See Configuring Administration Hosts With QMON and Configuring Submit Hosts With QMON. The Hosts list displays the execution hosts that are already defined. The Load Scaling list displays the currently configured load-scaling factors for the selected execution host. See Load Parameters for information about load parameters. The Access Attributes list displays access permissions. See Chapter 4, Managing User Access for information about access permissions. The Consumables/Fixed Attributes list displays resource availability for consumable and fixed resource attributes associated with the host. See Complex Resource Attributes for information about resource attributes. The Reporting Variables list displays the variables that are written to the reporting file when a load report is received from an execution host. See Defining Reporting Variables for information about reporting variables. The Usage Scaling list displays the current scaling factors for the individual usage metrics CPU, memory, and I/O for different machines. Resource usage is reported by sge_execd periodically for each currently running job. The scaling factors indicate the relative cost of resource usage on the particular machine for the user or project running a job. These factors could be used, for instance, to compare the cost of a second of CPU time on a 400 MHz processor to that of a 600 MHz CPU. Metrics that are not displayed in the Usage Scaling window have a scaling factor of 1. Adding or Modifying an Execution HostTo add or modify an execution host, click Add or Modify. The Add/Modify Exec Host dialog box appears. ![]() The Add/Modify Exec Host dialog box enables you to modify all attributes associated with an execution host. The name of an existing execution host is displayed in the Host field. If you are adding a new execution host, type its name in the Host field. Defining Scaling FactorsTo define scaling factors, click the Scaling tab. The Load column of the Load Scaling table lists all available load parameters, and the Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation. The Usage column of the Usage Scaling table lists the current scaling factors for the usage metrics CPU, memory, and I/O. The Scale Factor column lists the corresponding definitions of the scaling. You can edit the Scale Factor column. Valid scaling factors are positive floating-point numbers in fixed-point notation or scientific notation. Defining Resource AttributesTo define the resource attributes to associate with the host , click the Consumables/Fixed Attributes tab. ![]() The resource attributes associated with the host are listed in the Consumables/Fixed Attributes table. Use the Complex Configuration dialog box if you need more information about the current complex configuration, or if you want to modify it. For details about complex resource attributes, see Complex Resource Attributes. The Consumables/Fixed Attributes table lists all resource attributes for which a value is currently defined. You can enhance the list by clicking either the Name or the Value column name. The Attribute Selection dialog box appears, which includes all resource attributes that are defined in the complex. Figure 1–2 Attribute Selection Dialog Box
To add an attribute to the Consumables/Fixed Attributes table, select the attribute, and then click OK. To modify an attribute value, double-click a Value field, and then type a value. To delete an attribute, select the attribute, and then press Control-D or click mouse button 3. Click OK to confirm that you want to delete the attribute. Defining Access PermissionsTo define user access permissions to the execution host based on previously configured user access lists, click the User Access tab. ![]() To define project access permissions to the execution host based on previously configured projects, click the Project Access tab.
Defining Reporting VariablesTo define reporting variables, click the Reporting Variables tab. ![]() The Available list displays all the variables that can be written to the reporting file when a load report is received from the execution host. Select a reporting variable from the Available list, and then click the red right arrow to add the selected variable to the Selected list. To remove a reporting variable from the Selected list, select the variable, and then click the left red arrow. Deleting an Execution HostTo delete an execution host, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab. In the Execution Host dialog box, select the host that you want to delete, and then click Delete. Shutting Down an Execution Host DaemonTo shut down an execution host daemon, on the QMON Main Control window click the Host Configuration button, and then click the Execution Host tab. In the Execution Host dialog box, select a host, and then click Shutdown. Configuring Execution Hosts From the Command LineTo configure execution hosts from the command line, type the following command with appropriate options:
The following options are available:
Configuring Administration Hosts With QMONOn the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears, displaying the Administration Host tab. The Administration Host tab looks like the following figure: Figure 1–3 Administration Host Tab
Note – The Administration Host tab is displayed by default when you click the Host Configuration button for the first time. Use the Administration Host tab to configure hosts on which administrative commands are allowed. The Host list displays the hosts that already have administrative permission. Adding an Administration HostTo add a new administration host, type its name in the Host field, and then click Add, or press the Return key. Deleting an Administration HostTo delete an administration host from the list, select the host, and then click Delete. Configuring Administration Hosts From the Command LineTo configure administration hosts from the command line, type the following command with appropriate arguments:
Arguments to the qconf command and their consequences are as follows:
Configuring Submit Hosts With QMONTo configure submit hosts, on the QMON Main Control window click the Host Configuration button, and then click the Submit Host tab. The Submit Host tab is shown in the following figure. Figure 1–4 Submit Host Tab
Use the Submit Host tab to declare the hosts from which jobs can be submitted, monitored, and controlled. The Host list displays the hosts that already have submit permission. No administrative commands are allowed from submit hosts unless the hosts are also declared to be administration hosts. See Configuring Administration Hosts With QMON for more information. Adding a Submit HostTo add a submit host, type its name in the Host field, and then click Add, or press the Return key. Deleting a Submit HostTo delete a submit host, select it, and then click Delete. Configuring Submit Hosts From the Command LineTo configure submit hosts from the command line, type the following command with appropriate arguments:
The following options are available:
Configuring Host Groups With QMONHost groups enable you to use a single name to refer to multiple hosts. You can group similar hosts together in a host group. A host group can include other host groups as well as multiple individual hosts. Host groups that are members of another host group are subgroups of that host group. For example, you might define a host group called @bigMachines. This host group includes the following members:
The initial @ sign indicates that the name is a host group. The host group @bigMachines includes all hosts that are members of the two subgroups @solaris64 and @solaris32. @bigMachines also includes two individual hosts, fangorn and balrog. On the QMON Main Control window, click the Host Configuration button. The Host Configuration dialog box appears. Click the Host Groups tab. The Host Groups tab looks like the following figure. Figure 1–5 Host Groups Tab
Use the Host Groups tab to configure host groups. The Hostgroup list displays the currently configured host groups. The Members list displays all the hosts that are members of the selected host group. Adding or Modifying a Host GroupTo add a host group, click Add. To Modify a host group, click Modify. The Add/Modify Host Group dialog box appears. ![]() If you are adding a new host group, type a host group name in the Hostgroup field. The host group name must begin with an @ sign. If you are modifying an existing host group, the host group name is provided in the Hostgroup field. To add a host to the host group that you are configuring, type the host name in the Host field, and then click the red arrow to add the name to the Members list. To add a host group as a subgroup, select a host group name from the Defined Host Groups list, and then click the red arrow to add the name to the Members list. To remove a host or a host group from the Members list, select it, and then click the trash icon. Click Ok to save your changes and close the dialog box. Click Cancel to close the dialog box without saving your changes. Deleting a Host GroupTo delete a host group, select it from the Hostgroup list, and then click Delete. Configuring Host Groups From the Command LineTo configure host groups from the command line, type the following command with appropriate options:
The following options are available:
Monitoring Execution Hosts With qhostUse the qhost command to retrieve a quick overview of the execution host status:
This command produces output that is similar to the following example: Example 1–1 Sample qhost Output
See the qhost(1) man page for a description of the output format and for more options. Invalid Host NamesThe following is a list of host names that are invalid, reserved, or otherwise not allowed to be used:
Killing Daemons From the Command LineTo kill grid engine system daemons from the command line, use one of the following commands:
You must have manager or operator privileges to use these commands. See Chapter 4, Managing User Access for more information about manager and operator privileges.
If you want to wait for any active jobs to finish before you run the shutdown procedure, use the qmod -dq command for each cluster queue, queue instance, or queue domain before you run the qconf sequence described earlier. For information about cluster queues, queue instances, and queue domains, see Configuring Queues.
The qmod -dq command prevents new jobs from being scheduled to the disabled queue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons. Restarting Daemons From the Command LineLog in as root on the machine on which you want to restart grid engine system daemons. Type the following commands to run the startup scripts:
These scripts looks for the daemons normally running on this host and then start the corresponding ones. Basic Cluster ConfigurationThe basic cluster configuration is a set of information that is configured to reflect site dependencies and to influence grid engine system behavior. Site dependencies include valid paths for programs such as mail or xterm. A global configuration is provided for the master host as well as for every host in the grid engine system pool. In addition, you can configure the system to use a configuration local to each host to override particular entries in the global configuration. The cluster administrator should adapt the global configuration and local host configurations to the site's needs immediately after the installation. The configurations should be kept up to date afterwards. The sge_conf(5) man page contains a detailed description of the configuration entries. Displaying a Cluster Configuration With QMONOn the QMON Main Control window, click the Cluster Configuration button. The Cluster Configuration dialog box appears. Figure 1–6 Cluster Configuration Dialog Box
In the Host list, select the name of a host. The current configuration for the selected host is displayed under Configuration. Displaying the Global Cluster Configuration With QMONOn the QMON Main Control window, click the Cluster Configuration button. In the Host list, select global. The configuration is displayed in the format that is described in the sge_conf(5) man page. Adding and Modifying Global and Host Configurations With QMONIn the Cluster Configuration dialog box (Figure 1–6), select a host name or the name global, and then click Add or Modify. The Cluster Settings dialog box appears. ![]() The Cluster Settings dialog box enables you to change all parameters of a global configuration or a local host configuration. All fields of the dialog box are accessible only if you are modifying the global configuration. If you modify a local host, its configuration is reflected in the dialog box. You can modify only those parameters that are feasible for local host changes. If you are adding a new local host configuration, the dialog box fields are empty. The Advanced Settings tab shows a corresponding behavior, depending on whether you are modifying a configuration or are adding a new configuration. The Advanced Settings tab provides access to more rarely used cluster configuration parameters. ![]() When you finish making changes, click OK to save your changes and close the dialog box. Click Cancel to close the dialog box without saving changes. See the sge_conf(5) man page for a complete description of all cluster configuration parameters. Deleting a Cluster Configuration With QMONOn the QMON Main Control window, click the Cluster Configuration button. In the Host list, select the name of a host whose configuration you want to delete, and then click Delete. Displaying the Basic Cluster Configurations From the Command LineTo display the current cluster configuration, use the qconf -sconf command. See the qconf(1) man page for a detailed description. Type one of the following commands:
Modifying the Basic Cluster Configurations From the Command LineNote – You must be an administrator to use the qconf command to change cluster configurations. Type one of the following commands:
The qconf commands that are described here are examples of the many available qconf commands. See the qconf(1) man page for others. |
|||||||||||