InnerhalbNach weiteren Dokumenten suchenSupport-Ressourcen | Dieses Buch im PDF-Format herunterladen (3993 KB)
Chapter 8 Other Administrative TasksThis chapter describes how to use files and scripts to add or modify grid engine system objects such as queues, hosts, and environments. This chapter includes the following sections: Gathering Accounting and Reporting StatisticsThe grid engine system provides two kinds of reporting and accounting facilities:
Report Statistics (ARCo)You can use the optional Accounting and Reporting Console (ARCo) to generate live accounting and reporting data from the grid engine system and store the data in the reporting database, which is a standard SQL database. ARCo supports the following SQL database systems:
ARCo also provides a web-based tool for generating information queries on the reporting database and for retrieving the results in tabular or graphical form. ARCo enables you to store queries for later use, to run predefined queries, and to run queries in batch mode. For more information about how to use ARCo, see Chapter 5, Accounting and Reporting, in Sun N1 Grid Engine 6.1 User’s Guide. For information about how to install ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in Sun N1 Grid Engine 6.1 Installation Guide. Raw reporting data is generated by sge_qmaster. This raw data is stored in a reporting file. The dbwriter program reads the raw data in the reporting file and writes it to the SQL reporting database, where it can be accessed by ARCo. About the dbwriter ProgramThe dbwriter program performs the following tasks:
When dbwriter starts up, it calculates derived values. dbwriter also deletes outdated records at startup. If dbwriter runs in continuous mode, dbwriter continues to calculate derived values and to delete outdated records at hourly intervals, or at whatever interval you specify. You can specify in an XML file the values that you want to calculate and the records that you want to delete. Use the -calculation option of the dbwriter command to specify the path to this XML file. For detailed information about calculating derived values, see Calculating Derived Values With dbwriter. For detailed information about deleting outdated records, see Deleting Outdated Records With dbwriter. Enabling the Reporting FileThe reporting file contains the following types of data:
When the grid engine system is first installed, the reporting file is disabled. To use ARCo, you must enable the reporting file for the cluster. Once enabled, the reporting file will be generated by sge_qmaster. By default, the reporting file is located in sge-root/cell/common. You can change the default with the -reporting option of the dbwriter command. For information about configuring the generation of the reporting file, see the reporting_params parameter of the sge_conf(5) man page, and the report_variables parameter of the host_conf(5) man page. To enable the reporting file with QMON, on the Main Control window click the Cluster Configuration button, select the global host, and then click Modify. On the Cluster Settings dialog box, click the Advanced Settings tab. In the Reporting Parameters field, set the following parameters:
To enable the reporting file from the command line, use the qconf –mconf command to set the reporting_params attributes, as described in the preceding paragraph. Once the reporting file is enabled, the dbwriter can read raw data from the reporting file and write it to the reporting database. For more information about configuring the reporting file, see the reporting(5) man page. For complete details about installing and setting up ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in Sun N1 Grid Engine 6.1 Installation Guide. Calculating Derived Values With dbwriterThe rules for calculating derived values are specified in a derived tag, which is a sub tag of the DbWriterConfig tag. The following table lists the attributes of the derived tag:
The following table lists the subelements of the derived tag:
The autogenerated SQL statement looks like the following template:
The SQL template parameters are as follows:
Here is an example of an autogenerated SQL statement:
Deleting Outdated Records With dbwriterTo delete outdated records in the reporting database, you must specify a deletion rule in the delete tag. The following table lists the attributes of the delete tag:
The following table lists a subelement of the delete tag:
Here is an example of a delete tag:
Accounting and Usage Statistics (qacct)You can use the qacct command to generate alphanumeric accounting statistics. If you specify no options, qacct displays the aggregate usage on all machines of the cluster, as generated by all jobs that have finished and that are contained in the cluster accounting file sge-root/cell/common/accounting. In this case, qacct reports three times, in seconds: Several options are available for reporting accounting information about queues, users, and the like. In particular, you can use the qacct -l command to request information about all jobs that have finished and that match a resource requirement specification. Use the qacct -j [job-id | job-name] command to get direct access to the complete resource usage information stored by the grid engine system. This information includes the information that is provided by the getrusage system call. The -j option reports the resource usage entry for the jobs with job-id or with job-name. If no argument is given, all jobs contained in the referenced accounting file are displayed. If a job ID is specified, and if more than one entry is displayed, one of the following is true:
See the qacct(1) man page for more information. Backing Up the Grid Engine System ConfigurationYou can back up your grid engine system configuration files automatically. The automatic backup process uses a configuration file called backup_template.conf. The backup configuration file is located by default in sge-root/util/install_modules/backup_template.conf. The backup configuration file must define the following elements:
The backup template file looks like the following example:
To start the automatic backup process, type the following command on the sge_qmaster host:
backup-conf is the full path to the backup configuration file. Note – You do not need to shut down any of the grid engine system daemons before you back up your configuration files. Your backup is created in the directory specified by BACKUP_FILE. A backup log file called install.pid is also created in this directory. pid is the process ID number.
|
inst_sge -bup |
Enter the <sge-root> directory or use the default.
SGE Configuration Backup ------------------------ This feature does a backup of all configuration you made within your cluster. Please enter your SGE_ROOT directory. Default: [/home/user/ts/u10] |
Enter the <sge-cell> name or use the default.
Please enter your SGE_CELL name. Default: [default] |
Enter the backup destination directory or use the default.
Where do you want to save the backupfiles? Default: [/home/user/ts/u10/backup] |
Choose whether to create a compressed tar backup file.
If you are using different tar versions (gnu tar/ solaris tar), this option can make some trouble. In some cases the tar packages may be corrupt. Using the same tar binary for packing and unpacking works without problems.
Shall the backup function create a compressed tarpackage with your files? (y/n) [y] >> |
Enter file name of backup file, default is given.
... starting with backup Please enter a filename for your backupfile. Default: [backup.tar] >> |
Backup will be perfomed, info output is printed.
2007-01-11_22_43_22.dump bootstrap qtask settings.sh act_qmaster sgemaster settings.csh sgeexecd jobseqnum ... backup completed All information is saved in [/home/user/ts/u10/backup/backup.tar.gz[Z]] |
Shutdown the qmaster daemon before you start the restore process.The spooling database will be changed during restore which might lead to dataloss, if restore and qmaster are trying to access the same data.
Type the following command to start the restore process:
inst_sge -rst |
Read the messages on the screen and respond.
SGE Configuration Restore ------------------------- This feature restores the configuration from a backup you made previously. Hit, <ENTER> to continue! |
Enter the <sge-root> directory or use the default.
Please enter your SGE_ROOT directory. Default: [/home/user/ts/u10] |
Enter the <sge-cell> name or use the default.
Please enter your SGE_CELL name. Default: [default] |
Confirm the backup file format.
The backup file can be in a format other than a compressed tar file.
Is your backupfile in tar.gz[Z] format? (y/n) [y] |
Enter the full path to the backup file.
Please enter the full path and name of your backup file. Default: [/home/user/ts/u10/backup/backup.tar.gz] |
Verify the spooling database information.
The restore feature unpacks the backup file, and reads system information. To prevent data loss, you need to conform that the correct spooling database is detected.
Copying backupfile to /tmp/bup_tmp_22_51_40 /home/user/ts/u10/backup/backup.tar.gz 2007-01-11_22_43_22.dump bootstrap qtask settings.sh act_qmaster sgemaster settings.csh sgeexecd jobseqnum Spooling Method: berkeleydb detected! The path to your spooling db is [/tmp/dom/spooldb] If this is correct hit <ENTER> to continue, else enter the path. >> |
Restart qmaster.
This section describes how to use files and scripts to add or modify grid engine system objects such as queues, hosts, and environments.
You can use the QMON graphical user interface to perform all administrative tasks in the grid engine system. You can also administer a grid engine system through commands you type at a shell prompt and call from within shell scripts. Many experienced administrators find that using files and scripts is a more flexible, quicker, and more powerful way to change settings.
Use the qconf command with the following options to add objects according to specifications you create in a file:
qconf -Ae qconf -Aq qconf -Au qconf -Ackpt qconf -Ap |
Use the qconf command with the following options to modify objects according to specifications you create in a file:
qconf -Me qconf -Mq qconf -Mu qconf -Mckpt qconf -Mp |
The –Ae and –Me options add or modify execution hosts.
The –Aq and –Mq options add or modify queues.
The –Au and –Mu options add or modify usersets.
The –Ackpt and –Mckpt options add or modify checkpointing environments.
The –Ap and –Mp options add or modify parallel environments.
Use these options in combination with the qconf –s command to take an existing object and modify it. You can then update the existing object or create a new object.
#!/bin/sh # ckptmod.sh: modify the migration command # of a checkpointing environment # Usage: ckptmod.sh <checkpoint-env-name> <full-path-to-command> TMPFILE=tmp/ckptmod.$$ CKPT=$1 MIGMETHOD=$2 qconf -sckpt $CKPT | grep -v '^migr_command' > $TMPFILE echo "migr_command $MIGMETHOD" >> $TMPFILE qconf -Mckpt $TMPFILE rm $TMPFILE
You can modify individual queues, hosts, parallel environments, and checkpointing environments from the command line. Use the qconf command in combination with other commands.
If you have already prepared a file, type the qconf command with appropriate options:
qconf -Me qconf -Mq qconf -Mckpt qconf -Mp |
If you have not prepared a file, type the qconf command with appropriate options:
qconf -me qconf -mq qconf -mckpt qconf -mp |
The –Me and –me options modify execution hosts.
The –Mq and –mq options modify queues.
The –Mckpt and –mckpt options modify checkpointing environments.
The –Mp and –mp options modify parallel environments.
The difference between the uppercase –M options and the lowercase –m options controls the qconf command's result. Both –M and –m mean modify, but the uppercase –M denotes modification from an existing file, whereas the lowercase –m does not. Instead, the lowercase –m opens a temporary file in an editor. When you save any changes you make to this file and exit the editor, the system immediately reflects those changes.
However, when you want to change many objects at once, or you want to change object configuration noninteractively, use the qconf command with the options that modify object attributes (such as –Aattr, –Mattr, and so forth).
The following commands make modifications according to specifications in a file:
qconf -Aattr {queue | exechost | pe | ckpt} filename
qconf -Mattr {queue | exechost | pe | ckpt} filename
qconf -Rattr {queue | exechost | pe | ckpt} filename
qconf -Dattr {queue | exechost | pe | ckpt} filename
|
The following commands make modifications according to specifications on the command line:
qconf -aattr {queue | exechost | pe | ckpt} attribute value {queue-list | host-list}
qconf -mattr {queue | exechost | pe | ckpt} attribute value {queue-list | host-list}
qconf -rattr {queue | exechost | pe | ckpt} attribute value {queue-list | host-list}
qconf -dattr {queue | exechost | pe | ckpt} attribute value {queue-list | host-list}
|
The –Aattr and –aattr options add attributes.
The –Mattr and –mattr options modify attributes.
The –Rattr and –rattr options replace attributes.
The –Dattr and –dattr options delete attributes.
filename is the name of a file that contains attribute-value pairs.
attribute is the queue or host attribute that you want to change.
value is the value of the attribute you want to change.
The –aattr, –mattr, and –dattr options enable you to operate on individual values in a list of values. The –rattr option replaces the entire list of values with the new one that you specify, either on the command line or in the file.
The following command changes the queue type of tcf27–e019.q to batch only:
% qconf -rattr queue qtype batch tcf27-e019.q |
The following command uses the file new.cfg to modify the queue type and the shell start behavior of tcf27–e019.q:
% cat new.cfg qtype batch interactive checkpointing shell_start_mode unix_behavior % qconf -Rattr queue new.cfg tcf27-e019.q |
The following command adds the resource attribute scratch1 with a value of 1000M and the resource attribute long with a value of 2:
% qconf -rattr exechost complex_values scratch1=1000M,long=2 tcf27-e019 |
The following command attaches the resource attribute short to the host with a value of 4:
% qconf -aattr exechost complex_values short=4 tcf27-e019 |
The following command changes the value of scratch1 to 500M, leaving other values unchanged:
% qconf -mattr exechost complex_values scratch-=500M tcf27-e019 |
The following command deletes the resource attribute long:
% qconf -dattr exechost complex_values long tcf27-e019 |
The following command adds tcf27–b011.q to the list of queues for the checkpointing environment sph:
% qconf -aattr ckpt queue_list tcf27-b011.q sph |
The following command changes the number of slots in the parallel environment make to 50:
% qconf -mattr pe slots 50 make |
The qselect command outputs a list of queue instances. If you specify options, qselect lists only the queue instances that match the criteria you specify. You can use qselect in combination with the qconf command to target specific queue instances that you want to modify.
The following command lists all queue instances on Linux machines:
% qselect -l arch=glinux |
The following command lists all queue instances on machines with two CPUs:
% qselect -l num_proc=2 |
The following command lists all queue instances on all four-CPU 64–bit Solaris machines:
% qselect -l arch=solaris64,num_proc=4 |
The following command lists queue instances that provide an application license. The queue instances were previously configured.
% qselect -l app_lic=TRUE |
You can combine qselect with qconf to do wide-reaching changes with a single command line. To do this, put the entire qselect command inside backward quotation marks (` `) and use it in place of the queue-list variable on the qconf command line.
The following command sets the prolog script to sol_prolog.sh on all queue instances on Solaris machines:
% qconf -mattr queue prolog /usr/local/scripts/sol_prolog.sh `qselect -l arch=solaris` |
The following command sets the attribute fluent_license to two on all queue instances on two-processor systems:
% qconf -mattr queue complex_values fluent_license=2 `qselect -l num_proc=2` |
The most flexible way to automate the configuration of queue instances is to use the qconf command with the qselect command. With the combination of these commands, you can build up your own custom administration scripts.
To change a global configuration, use the qconf –mconf command. To change the scheduler, use the qconf –msconf command.
Both of these commands open a temporary file in an editor. When you exit the editor, any changes that you save to this temporary file are processed by the system and take effect immediately. The editor used to open the temporary file is the editor specified by the EDITOR environment variable. If this variable is undefined, the vi editor is used by default.
You can use the EDITOR environment variable to automate the behavior of the qconf command. Change the value of this variable to point to an editor program that modifies a file whose name is given by the first argument. After the editor modifies the temporary file and exits, the system reads in the modifications, which take effect immediately.
If the modification time of the file does not change after the edit operation, the system sometimes incorrectly assumes that the file was not modified. Therefore you should insert a sleep 1 instruction before writing the file, to ensure a different modification time.
You can use this technique with any qconf –m... command. However, the technique is especially useful for administration of the scheduler and the global configuration, as you cannot automate the procedure in any other way.
The following example modifies the schedule interval of the scheduler:
#!/bin/ksh
# sched_int.sh: modify the schedule interval
# usage: sched_int.sh <n>, where <n> is
# the new interval, in seconds. n < 60
TMPFILE=/tmp/sched_int.$$
if [ $MOD_SGE_SCHED_INT ]; then
grep -v schedule_interval $1 > $TMPFILE
echo "schedule_interval 0:0:$MOD_SGE_SCHED_INT" >> $TMPFILE
# sleep to ensure modification time changes
sleep 1
mv $TMPFILE $1
else
export EDITOR=$0
export MOD_SGE_SCHED_INT=$1
qconf -msconf
fi
|
This script modifies the EDITOR environment to point to itself. The script then calls the qconf –msconf command. This second nested invocation of the script modifies the temporary file specified by the first argument and then exits. The grid engine system automatically reads in the changes, and the first invocation of the script terminates.