Contained WithinFind More DocumentationFeatured Support Resources | Scarica il manuale in formato PDF (3708 KB)
Chapter 7 Other Administrative TasksThis chapter describes how to use files and scripts to add or modify grid engine system objects such as queues, hosts, and environments. This chapter includes the following sections: Gathering Accounting and Reporting StatisticsThe grid engine system provides two kinds of reporting and accounting facilities:
Report Statistics (ARCo)You can use the optional Accounting and Reporting Console (ARCo) to generate live accounting and reporting data from the grid engine system and store the data in the reporting database, which is a standard SQL database. ARCo supports the following SQL database systems:
ARCo also provides a web-based tool for generating information queries on the reporting database and for retrieving the results in tabular or graphical form. ARCo enables you to store queries for later use, to run predefined queries, and to run queries in batch mode. For more information about how to use ARCo, see Chapter 5, Accounting and Reporting, in N1 Grid Engine 6 User’s Guide. For information about how to install ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in N1 Grid Engine 6 Installation Guide. Raw reporting data is generated by sge_qmaster. This raw data is stored in a reporting file. The dbwriter program reads the raw data in the reporting file and writes it to the SQL reporting database, where it can be accessed by ARCo. About the dbwriter ProgramThe dbwriter program performs the following tasks:
When dbwriter starts up, it calculates derived values. dbwriter also deletes outdated records at startup. If dbwriter runs in continuous mode, dbwriter continues to calculate derived values and to delete outdated records at hourly intervals, or at whatever interval you specify. You can specify in an XML file the values that you want to calculate and the records that you want to delete. Use the -calculation option of the dbwriter command to specify the path to this XML file. For detailed information about calculating derived values, see Calculating Derived Values With dbwriter. For detailed information about deleting outdated records, see Deleting Outdated Records With dbwriter. Enabling the Reporting FileThe reporting file contains the following types of data:
When the grid engine system is first installed, the reporting file is disabled. To use ARCo, you must enable the reporting file for the cluster. Once enabled, the reporting file will be generated by sge_qmaster. By default, the reporting file is located in sge-root/cell/common. You can change the default with the -reporting option of the dbwriter command. For information about configuring the generation of the reporting file, see the reporting_params parameter of the sge_conf(5) man page, and the report_variables parameter of the sge_host(5) man page. To enable the reporting file with QMON, on the Main Control window click the Cluster Configuration button, select the global host, and then click Modify. On the Cluster Settings dialog box, click the Advanced Settings tab. In the Reporting Parameters field, set the following parameters:
To enable the reporting file from the command line, use the qconf –mconf command to set the reporting_params attributes, as described in the preceding paragraph. Once the reporting file is enabled, the dbwriter can read raw data from the reporting file and write it to the reporting database. For more information about configuring the reporting file, see the reporting(5) man page. For complete details about installing and setting up ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in N1 Grid Engine 6 Installation Guide. Calculating Derived Values With dbwriterThe rules for calculating derived values are specified in a derived tag, which is a sub tag of the DbWriterConfig tag. The following table lists the attributes of the derived tag:
The following table lists the subelements of the derived tag:
The autogenerated SQL statement looks like the following template:
The SQL template parameters are as follows:
Here is an example of an autogenerated SQL statement:
Deleting Outdated Records With dbwriterTo delete outdated records in the reporting database, you must specify a deletion rule in the delete tag. The following table lists the attributes of the delete tag:
The following table lists a subelement of the delete tag:
Here is an example of a delete tag:
Accounting and Usage Statistics (qacct)You can use the qacct command to generate alphanumeric accounting statistics. If you specify no options, qacct displays the aggregate usage on all machines of the cluster, as generated by all jobs that have finished and that are contained in the cluster accounting file sge-root/cell/common/accounting. In this case, qacct reports three times, in seconds: Several options are available for reporting accounting information about queues, users, and the like. In particular, you can use the qacct -l command to request information about all jobs that have finished and that match a resource requirement specification. Use the qacct -j [job-id | job-name] command to get direct access to the complete resource usage information stored by the grid engine system. This information includes the information that is provided by the getrusage system call. The -j option reports the resource usage entry for the jobs with job-id or with job-name. If no argument is given, all jobs contained in the referenced accounting file are displayed. If a job ID is specified, and if more than one entry is displayed, one of the following is true:
See the qacct(1) man page for more information. Backing Up the Grid Engine System ConfigurationYou can back up your grid engine system configuration files automatically. The automatic backup process uses a configuration file called backup_template.conf. The backup configuration file is located by default in sge-root/util/install_modules/backup_template.conf. The backup configuration file must define the following elements:
The backup template file looks like the following example:
To start the automatic backup process, type the following command on the sge_qmaster host:
backup-conf is the full path to the backup configuration file. Note – You do not need to shut down any of the grid engine system daemons before you back up your configuration files. Your backup is created in the directory specified by BACKUP_FILE. A backup log file called install.pid is also created in this directory. pid is the process ID number. Using Files and Scripts for Administration TasksThis section describes how to use files and scripts to add or modify grid engine system objects such as queues, hosts, and environments. You can use the QMON graphical user interface to perform all administrative tasks in the grid engine system. You can also administer a grid engine system through commands you type at a shell prompt and call from within shell scripts. Many experienced administrators find that using files and scripts is a more flexible, quicker, and more powerful way to change settings. Using Files to Add or Modify ObjectsUse the qconf command with the following options to add objects according to specifications you create in a file:
Use the qconf command with the following options to modify objects according to specifications you create in a file:
The –Ae and –Me options add or modify execution hosts. The –Aq and –Mq options add or modify queues. The –Au and –Mu options add or modify usersets. The –Ackpt and –Mckpt options add or modify checkpointing environments. The –Ap and –Mp options add or modify parallel environments. Use these options in combination with the qconf –s command to take an existing object and modify it. You can then update the existing object or create a new object. Example 7–1 Modifying the Migration Command of a Checkpoint Environment#!/bin/sh # ckptmod.sh: modify the migration command # of a checkpointing environment # Usage: ckptmod.sh <checkpoint-env-name> <full-path-to-command> TMPFILE=tmp/ckptmod.$$ CKPT=$1 MIGMETHOD=$2 qconf -sckpt $CKPT | grep -v '^migr_command' > $TMPFILE echo "migr_command $MIGMETHOD" >> $TMPFILE qconf -Mckpt $TMPFILE rm $TMPFILE Using Files to Modify Queues, Hosts, and EnvironmentsYou can modify individual queues, hosts, parallel environments, and checkpointing environments from the command line. Use the qconf command in combination with other commands.
The –Me and –me options modify execution hosts. The –Mq and –mq options modify queues. The –Mckpt and –mckpt options modify checkpointing environments. The –Mp and –mp options modify parallel environments. The difference between the uppercase –M options and the lowercase –m options controls the qconf command's result. Both –M and –m mean modify, but the uppercase –M denotes modification from an existing file, whereas the lowercase –m does not. Instead, the lowercase –m opens a temporary file in an editor. When you save any changes you make to this file and exit the editor, the system immediately reflects those changes. However, when you want to change many objects at once, or you want to change object configuration noninteractively, use the qconf command with the options that modify object attributes (such as –Aattr, –Mattr, and so forth). The following commands make modifications according to specifications in a file:
The following commands make modifications according to specifications on the command line:
The –Aattr and –aattr options add attributes. The –Mattr and –mattr options modify attributes. The –Rattr and –rattr options replace attributes. The –Dattr and –dattr options delete attributes. filename is the name of a file that contains attribute-value pairs. attribute is the queue or host attribute that you want to change. value is the value of the attribute you want to change. The –aattr, –mattr, and –dattr options enable you to operate on individual values in a list of values. The –rattr option replaces the entire list of values with the new one that you specify, either on the command line or in the file. Example 7–2 Changing the Queue TypeThe following command changes the queue type of tcf27–e019.q to batch only:
Example 7–3 Modifying the Queue Type and the Shell Start BehaviorThe following command uses the file new.cfg to modify the queue type and the shell start behavior of tcf27–e019.q:
Example 7–4 Adding Resource AttributesThe following command adds the resource attribute scratch1 with a value of 1000M and the resource attribute long with a value of 2:
Example 7–5 Attaching a Resource Attribute to a HostThe following command attaches the resource attribute short to the host with a value of 4:
Example 7–6 Changing a Resource ValueThe following command changes the value of scratch1 to 500M, leaving other values unchanged:
Example 7–7 Deleting a Resource AttributeThe following command deletes the resource attribute long:
Example 7–8 Adding a Queue to the List of Queues for a Checkpointing EnvironmentThe following command adds tcf27–b011.q to the list of queues for the checkpointing environment sph:
Example 7–9 Changing the Number of Slots in a Parallel EnvironmentThe following command changes the number of slots in the parallel environment make to 50:
Targeting Queue Instances with the qselect CommandThe qselect command outputs a list of queue instances. If you specify options, qselect lists only the queue instances that match the criteria you specify. You can use qselect in combination with the qconf command to target specific queue instances that you want to modify. Example 7–10 Listing QueuesThe following command lists all queue instances on Linux machines:
The following command lists all queue instances on machines with two CPUs:
The following command lists all queue instances on all four-CPU 64–bit Solaris machines:
The following command lists queue instances that provide an application license. The queue instances were previously configured.
You can combine qselect with qconf to do wide-reaching changes with a single command line. To do this, put the entire qselect command inside backward quotation marks (` `) and use it in place of the queue-list variable on the qconf command line. Example 7–11 Using qselect in qconf CommandsThe following command sets the prolog script to sol_prolog.sh on all queue instances on Solaris machines:
The following command sets the attribute fluent_license to two on all queue instances on two-processor systems:
The most flexible way to automate the configuration of queue instances is to use the qconf command with the qselect command. With the combination of these commands, you can build up your own custom administration scripts. Using Files to Modify a Global Configuration or the SchedulerTo change a global configuration, use the qconf –mconf command. To change the scheduler, use the qconf –msconf command. Both of these commands open a temporary file in an editor. When you exit the editor, any changes that you save to this temporary file are processed by the system and take effect immediately. The editor used to open the temporary file is the editor specified by the EDITOR environment variable. If this variable is undefined, the vi editor is used by default. You can use the EDITOR environment variable to automate the behavior of the qconf command. Change the value of this variable to point to an editor program that modifies a file whose name is given by the first argument. After the editor modifies the temporary file and exits, the system reads in the modifications, which take effect immediately. Note – If the modification time of the file does not change after the edit operation, the system sometimes incorrectly assumes that the file was not modified. Therefore you should insert a sleep 1 instruction before writing the file, to ensure a different modification time. You can use this technique with any qconf –m... command. However, the technique is especially useful for administration of the scheduler and the global configuration, as you cannot automate the procedure in any other way. Example 7–12 Modifying the Schedule IntervalThe following example modifies the schedule interval of the scheduler:
This script modifies the EDITOR environment to point to itself. The script then calls the qconf –msconf command. This second nested invocation of the script modifies the temporary file specified by the first argument and then exits. The grid engine system automatically reads in the changes, and the first invocation of the script terminates. |
||||||||||||||||||||||||||