N1 Grid Engine 6 Installation Guide
  Search only this book
View this book in:
Download this book in PDF (1058 KB)

Chapter 1 Before You Install the Software

This chapter describes the steps that need to be taken before you install the N1TM Grid Engine 6 software (grid engine software).

This chapter includes the following topics:

Plan the Installation

Whether you have installed previous versions of the grid engine software or this is your first time, you will need to do some planning before you extract and install the software. This section describes the decisions you must make, and, wherever possible, gives you criteria on which you can base your decisions.

Decisions That You Must Make

You must make several decisions before you can plan the installation:

  • Decide whether your system of networked computer hosts that run N1 Grid Engine 6 software (grid engine system) is to be a single cluster or a collection of sub-clusters, called cells. Cells allow you to install separate instances of the grid engine software but share the binary files across those instances.

  • Select the machines that are to be grid engine system hosts. Determine the host type of each machine: master host, shadow master host, administration host, submit host, execution host, or a combination.

  • Ensure that all users of the grid engine system have the same user names on all submit and execution hosts.


    Note –

    Hosts running Windows as operating system cannot have the host type master host or shadow master host.


  • Decide how to order grid engine software directories. For example, you could organize directories as a complete tree on each workstation, or you could cross-mount directories, or you could set up a partial directory tree on some workstations. You must also decide where to locate each grid engine software installation directory, sge-root.

  • Decide on the site's queue structure.

  • Determine whether to define network services as an NIS file or as local to each workstation in /etc/services.

  • Use the information in this chapter to gather the information necessary to complete the installation worksheet.

Gather the Necessary Information

Before you install the grid engine software, you must plan how to achieve the results that fit your environment. This section helps you make the decisions that affect the rest of the procedure. Write down your installation plan in a table similar to the following example.

Parameter 

Value 

sge-root directory

 

Cell name 

 

Administrative user 

 

sge_qmaster port number

 

sge_execd port number

 

Master host 

 

Shadow master hosts 

 

Execution hosts 

 

Administration hosts 

 

Submit hosts 

 

Group ID range for jobs 

 

Spooling mechanism (Berkeley DB or Classic spooling) 

 

Berkeley DB server host (the master or another host) 

 

Berkeley DB spooling directory on the database server 

 

Scheduler tuning profile (Normal, High, Max) 

 

Installation method (interactive, secure, automated, or upgrade) 

 

If you are going to install N1GE 6 on a Windows system, acquire and install Microsoft Services For UNIX. See Appendix A for more information. 

 

If you are going to install N1GE 6 on a Windows system, create the required CSP certificates before installing N1GE. See the section called “How to Install a CSP-secured System” in Chapter 4 for information about CSP certificates. 

 

Check the Other Installations Appendix for applicability. 

 

Disk Space Requirements

The grid engine software directory tree has the following fixed disk space requirements:

  • 40 Mbytes for the installation files (including documentation) without any binaries

  • Between 10 and 15 Mbytes for each set of binaries

The ideal disk space for grid engine system spool directories is as follows:

  • 10-200 Mbytes for the master host spool directories

  • 10-200 Mbytes for the Berkeley DB spool directories


Note –

The spool directories of the master host and of the execution hosts are configurable and need not reside under the default location, sge-root.



Note –

You must satisfy several Windows-specific prerequisites before you can install N1GE on hosts running Windows as operating system. It might be necessary to install additional software on your computer which might require additional disk space. (See Appendix A).


sge-root Installation Directory

Create a directory into which you will load the contents of the distribution media. This directory is called the root directory, or sge-root. When the grid engine system is running, this directory stores the current cluster configuration and all other data that must be spooled to disk.


Note –

It is not a requirement that spool areas reside under sge-root. Actually this location may be avoided for efficiency reasons.


Use a valid path name for the directory that is network accessible on all hosts. For example, if the file system is mounted using automounter, set sge-root to /usr/N1GE6, not to /tmp_mnt/usr/N1GE6. Throughout this document, the sge-root variable is used to refer to the installation directory.

sge-root is the top level of the grid engine software directory tree. Each grid engine system component in a cell needs read access to the sge-root/cell/common directory, on startup. When grid engine software is installed as a single cluster, the value of cell is default.

For ease of installation and administration, this directory should be readable on all hosts on which you intend to run the grid engine software installation procedure. For example, you can select a directory available across a network file system, such as NFS. If you choose to select file systems that are local to the hosts, you must copy the installation directory to each host before you start the installation procedure for the particular machine. See File Access Permissions for a description of required permissions.

Directory Organization

When determining the directory organization, you must decide the following:

  • The directory organization (for example, whether you will install a complete software tree on each workstation, directories cross-mounted, or a partial directory tree on some workstations, and so on)

  • Where to locate each root directory, sge-root


Note –

Because changing the installation directory or the spool directories requires a new installation of the system, use extra care to select a suitable installation directory up front. Note that all important information from a previous installation can be preserved.


By default, the installation procedure installs the grid engine software, manuals, spool areas, and the configuration files in a directory hierarchy under the installation directory (see Figure 1–1). If you accept this default behavior, you should install or select a directory with the access permissions that are described in File Access Permissions.

You can select the spool areas to put in other locations during the primary installation. See Configuring Queues in N1 Grid Engine 6 Administration Guide for instructions.

Figure 1–1 Sample Directory Hierarchy

Browser window. Displays directory hierarchy of sge-root installation
directory.

Cells

You can set up the grid engine system as a single cluster or as a collection of loosely coupled clusters called cells. The SGE_CELL environment variable indicates the cluster being referenced. When the grid engine system is installed as a single cluster, $SGE_CELL is not set, and the value default is assumed for the cell value.

User Names

In order for the grid engine system to verify that users submitting jobs have permission to submit them on the desired execution hosts, users' names must be identical on the submit and execution hosts involved. You may therefore have to change user names on some machines, because grid engine system users map directly to system user accounts.


Note –

User names on the master host are not relevant for permission checking. These user names do not have to match or even exist.


Installation Accounts

You can install the grid engine software either as the root user or as an unprivileged user (for example, your own user account). However, if you install the software logged in as an unprivileged user, the installation allows only that user to run grid engine system jobs. Access is denied to all other accounts. Installing the software logged in as the root account resolves this restriction. However, root permission is required for the complete installation procedure. Also, if you install as an unprivileged user, you are not allowed to use the qrsh, qtcsh, qmake commands, nor can you run tightly integrated parallel jobs.


Note –

If you decide to install grid engine as an unprivileged user, you must set the sge_commd port number higher than or equal to 1024.


File Access Permissions

If you install the software logged in as root, you might have a problem configuring root read/write access for all hosts on a shared file system. Therefore, you might have problems putting sge-root onto a network-wide file system.

You can force grid engine software to run all grid engine system components through a non-root administrative user account (called sgeadmin, for example). With this setup, you need only read/write access to the shared sge-root file system for this particular user.

The installation procedure asks whether files should be created and owned by an administrative user account. If you answer “Yes” and provide a valid user name, files are created by this user. Otherwise, the user name under which you run the installation procedure is used. It is recommended that you create an administrative user, and answer “ Yes” to this question.

Make sure in all cases that the account used for file handling on all hosts has read/write access to the sge-root directory. Also, the installation procedure assumes that the host from which you access the grid engine software distribution media can write to the sge-root directory.


Note –

The name of the root user on Windows hosts depends on the system language of the Windows operating system. It is even possible to change the name of the root user. The default name for many languages is the name Administrator.


Network Services

Determine whether your site's network services are defined in an NIS database or in an /etc/services file that is local to each workstation. If your site uses NIS, find out the host name of your NIS server is so that you can add entries to the NIS “services” map.

The grid engine system services are sge_execd and sge_qmaster. To add the services to your NIS map, choose reserved, unused port numbers. The following examples show sge_qmaster and sge_execd entries.


sge_qmaster 536/tcp

sge_execd 537/tcp

Master Host

The master host controls the grid engine system. This host runs the master daemon sge_qmaster, and the scheduling daemon, sge_schedd.

The master host must comply with the following requirements:

  • The host must be a stable platform.

  • The host must not be excessively busy with other processing.

  • At least 60 – 120 Mbytes of unused main memory must be available to run the grid engine system daemons. For very large clusters that include many hundreds or thousands of hosts and tens of thousands of jobs in the system at any time, 1 GByte or more of unused main memory might be required and two CPUs might be beneficial.

  • The master host must be installed before shadow master execution, administration, or submit hosts.

  • (Optional) The grid engine software directory, sge-root, should be installed locally, to cut down on network traffic.


Note –

Windows hosts cannot act as master hosts.


Shadow Master Hosts

These hosts back up the functionality of sge_qmaster in case the master host or the master daemon fails. To be a shadow master host, a machine must have the following characteristics:

  • It must run sge_shadowd.

  • It must share sge_qmaster status, job information, and queue configuration information that is logged to disk. In particular, the shadow master hosts need read/write root or administration user access to the sge_qmaster spool directory and to the sge-root/cell/common directory.

  • The sge-root/cell/common/shadow_masters file must contain a line defining the host as a shadow master host.


Note –

If no cell name is specified during installation, the value of cell is default.


The shadow master host facility is activated for a host as soon as these conditions are met. You do not need to restart the grid engine system daemons to make a host into a shadow master host.


Note –

Windows hosts cannot act as master hosts.


Spool Directories Under the Root Directory

During the installation of the master host, you must specify the location of a spooling directory. This directory is used to spool jobs from execution hosts that do not have a local spooling directory.

  • On the master host, spool directories are maintained under qmaster-spool-dir. The location of qmaster-spool-dir is defined during the master host installation process. The default value of qmaster-spool-dir is sge-root/cell/spool/qmaster.

  • On each execution host, a spool directory called execd-spool-dir is defined during the execution host installation processes. The default value of execd-spool-dir is sge-root/cell/spool/exec-host. You will get better performance from execution hosts with local spooling directories than from execution hosts that have NFS mounted the master host's spooling directory.


Note –

If no cell name is specified during installation, the value of cell is default.


You do not need to export these directories to other machines. However, exporting the entire sge-root tree and making it write-accessible for the master host and all executable hosts makes administration easier.

Choosing Between Classic Spooling and Database Spooling

During the installation, you are given the option to choose between classic spooling and Berkeley DB spooling server. If you choose Berkeley DB spooling, you are then given the option to spool to a local directory or to a separate host, known as a Berkeley DB spooling server.

While classic spooling is an option, you should see better performance using a Berkeley DB spooling server. Part of this performance increase is because the master host can make non-blocking writes to the database, but has to make blocking writes to the text file used by classic spooling. Other factors that may influence your decision are file format and data integrity. There is a greater level of data integrity when you write to the Berkeley DB, rather than to a text file. However, a text file stores data in a format that humans can read and edit. Normally, you do not need to read these files, but the spooling directory contains the messages from the system daemons, which can be useful during debugging.

Database Server and Spooling Host

The master host can store its configuration and state to a Berkeley DB spooling database. The spooling database can be installed on the master server or on a separate host. When the Berkeley DB spools into a local directory on the master host, the performance is better, but if you want to set up a shadow master host, you need to use a separate Berkeley DB spooling server (host). In this case, you have to choose a host with a configured RPC service. The master host connects through RPC to the Berkeley DB.


Note –

Using a shadow master host is more reliable, but using a separate Berkeley DB spooling host results in a potential security hole. RPC communication as used by the Berkeley DB can be easily compromised. Only use this alternative if your site is secure and if users can be trusted to access the Berkeley DB spooling host by means of TCP/IP communication.


If you choose to use Berkeley DB spooling without a shadow master, you don't need to set up a separate spooling server. Likewise, if you choose not to use Berkeley DB spooling, you can set up a shadow master host without setting up a separate spooling server.

Once you determine whether you need a separate spooling server, you will also need to determine the location for the spooling directory. The spooling directory must be local to the spooling server. A default value for the location of the spooling directory is recommended during installation but this default value is not suitable when the file server is different from the master host.

The requirements for the Berkeley DB spooling host are similar to the requirements for the master host:

  • The host must be a stable platform.

  • The host must not be excessively busy with other processing.

  • At least 60 – 120 Mbytes of unused main memory must be available to run the grid engine system daemons. For very large clusters that include many hundreds or thousands of hosts and tens of thousands of jobs in the system at any time, 1 GByte or more of unused main memory may be required and two CPUs may be beneficial.

  • (Optional) A separate spooling host must be installed before the master host.

  • (Optional) The directory, sge-root, should be installed locally, to cut down on network traffic.

Execution Hosts

Execution hosts run the jobs that users submit to the grid engine system. An execution host must first be set up as an administration host. You run an installation script on each execution host.

Group IDs

You need to provide a range of IDs that will be assigned dynamically for jobs. The range must be big enough to provide enough numbers for the maximum number of grid engine system jobs running at a single moment on a single host.

A group ID is assigned to each grid engine system job to monitor the resource utilization of the job. Each job will be assigned a unique ID during the time it is running. For example, a range of 20000-20100 allows 100 jobs to run concurrently on a single host. You can change the group ID range for your cluster configuration at any time, but the values in the UNIX group ID range must be unused on your system.

Administration Hosts

Operators and managers of the grid engine system use administration hosts to perform administrative tasks such as reconfiguring queues or adding grid engine system users.

The master host installation script automatically makes the master host an administration host. During the master host installation process, you can add other administration hosts. You can also manually add administration hosts on the master host at any time after installation.

Submit Hosts

Jobs can be submitted and controlled from submit hosts. The master host installation script automatically makes the master host a submit host.

Cluster Queues

The installation procedure creates a default cluster queue structure, which is suitable for getting acquainted with the system. The default queue can be removed after installation.


Note –

No matter what directory is used for the installation of the software, the administrator can change most settings that were created by the installation procedure. This change can be made while the system is running.


There are several factors you need to consider when determining a queue structure.

  • Whether you need cluster queues for sequential, interactive, parallel, and other job types

  • Which queue instances to put on which execution hosts

  • How many job slots are needed in each queue

For more detailed information on administering cluster queues, see Configuring Queues in N1 Grid Engine 6 Administration Guide.

Scheduler Profiles

You can choose from three scheduler profiles during the installation process: normal, high, and max. You can use these predefined profiles as a starting point for grid engine tuning.

Using these profiles, you can optimize the scheduler for one or more of the following:

  • The amount of information about a scheduling run

  • The load adjustment during a scheduling run

  • Interval scheduling (the default) or immediate scheduling

You can choose from three scheduler profiles:

  • normal— This profile uses load adaptation, and interval scheduling, and reports all the information that the scheduler gathers during the dispatch cycle. This profile is the starting point for most grids. Use this profile if your highest priority is gathering and reporting information about a scheduling run.

  • high— This profile is more appropriate for a large cluster, where throughput is more important than gathering and reporting all the information from the scheduler. This profile also uses interval scheduling. Use this profile if you want to get better performance at the cost of getting less information about your scheduling runs.

  • max— This profile disables all information gathering and reporting, enables immediate scheduling, and disables load adaptation. Immediate scheduling is very useful for sites with high throughput and very short running jobs. The advantage of immediate scheduling decreases as run time of the jobs increases. This profile can be used in clusters of any size where only throughput is important and everything else is a lower priority.

For more information on how to configure scheduling, see Administering the Scheduler in N1 Grid Engine 6 Administration Guide.

Installation Method

Several methods are available for installing the grid engine software:

  • Interactive

  • Interactive, with increased security

  • Automated, using the inst_sge script and a configuration file

  • Upgrade

To decide which installation method you should use, consider the following factors.

Check the Other Installation Issues Appendix

If you are installing N1GE on a Linux system or on a system with IPMP, see the Other Installation Issues appendix for some important information.

Loading the Distribution Files On a Workstation

Grid Engine is distributed on CD-ROM, and through electronic download. For information on how to access CD-ROMs, ask your system administrator or refer to your local system documentation. The CD-ROM distribution contains a directory named N1_Grid_Engine_6u4. The product distribution is in this directory, in both tar.gz format and the pkgadd format. The pkgadd format is provided for the SolarisTM Operating System (Solaris OS). For all supported operating systems, the software is distributed in tar.gz format.

ProcedureHow to Load the Distribution Files On a Workstation

Before You Begin

Ensure that the file systems and directories that are to contain the grid engine software distribution and the spool and configuration files are set up properly by setting the access permissions as defined in File Access Permissions.

Steps
  1. Provide access to the distribution media.

    • If you downloaded the software, rather than getting it on CD-ROM, just unzip the five files into a directory. This directory must be located on a file system that has at least 350 MBytes free space.

  2. Log in to a system, preferably a system that has a direct connection to a file server.

  3. Create the installation directory as described in sge-root Installation Directory.


    # mkdir  /opt/n1ge6
    

    In these instructions, the installation directory is abbreviated as sge-root.

  4. Install the binaries for all binary architectures that are to be used by any of your master, execution, and submit hosts in your grid engine system cluster.

    • The pkgadd Method

      The pkgadd format is provided for the Solaris Operating System. To facilitate remote installation, the pkgadd directories are also provided in zip files.

      You can install the following packages:

      • SUNWsgeec – Architecture independent files

      • SUNWsgeed – Documentation

      • SUNWsgee – Solaris (SPARC® platform) 32-bit binaries for Solaris 7, Solaris 8, and Solaris 9 Operating Systems

      • SUNWsgeex – Solaris (SPARC platform) 64-bit binaries for Solaris 7, Solaris 8, and Solaris 9 Operating Systems

      • SUNWsgeei — Solaris (x86 platform) binaries for Solaris 8 and Solaris 9 Operating Systems

      • SUNWsgeeax - Solaris (x64 platform) binaries for Solaris 10 Operating System

      • SUNWsgeea - Accounting and Reporting Console (ARCo) packages for the Solaris and Linux Operating systems.

      As you type the following commands, you must be prepared to respond to script questions about your base directory, sge-root and the administrative user. The script requests the choices you made during the planning steps of this installation. See Decisions That You Must Make.

      At the command prompt, type the following commands, responding to the script questions.


      # cd cdrom_mount_point/N1_Grid_Engine_6u4
      # pkgadd -d ./Common/Packages/SUNWsgeec
      # pkgadd -d ./Docs/Packages/SUNWsgeed
      # pkgadd -d ./Solaris_sparc/Packages/SUNWsgee (This is optional;  at least one binary set is required)
      # pkgadd -d ./Solaris_sparc/Packages/SUNWsgeex (This is optional; at least one binary  set is required)
      # pkgadd -d ./Solaris_x86/Packages/SUNWsgeei (This is optional; at least one binary  set is required)
      # pkgadd -d ./Solaris_x64/Packages/SUNWsgeeax (This is optional; at least one binary  set is required)
      
    • The tar Method

      For all supported operating systems, the software is distributed in tar.gz format.

      The following table contains files that you will need to install, regardless of what platform you are on.

      File

      Description

      Common/tar/n1ge-6_0u4-common.tar.gz

      Architecture independent files 

      Docs/tar/n1ge-6_0u4-doc.tar.gz

      Release notes, installation guide, user guide and administration guide 

      The tar files that contain platform-specific binaries use the naming convention of n1ge-6_0u4-bin-architecture.tar.gz. The following table lists the platform-specific binaries. You need to install the file for each platform that you need to support. Note that each platform has its own directory under N1_Grid_Engine_6u4.

      Platform-Specific File

      Platform

      Solaris_sparc/tar/n1ge-6_0u4-bin-solaris.tar.gz 

      Solaris (SPARC platform) 32-bit binaries for Solaris 7, Solaris 8, and Solaris 9 Operating Systems 

      Solaris_sparc/tar/n1ge-6_0u4-bin-solaris-sparcv9.tar.gz 

      Solaris (SPARC platform) 64-bit binaries for Solaris 7, Solaris 8, and Solaris 9 Operating Systems 

      Solaris_x86/tar/n1ge-6_0u4-bin-solaris-i586.tar.gz 

      Solaris (x86 platform) binaries for Solaris 8, and Solaris 9 Operating Systems 

      Solaris_x64/tar/n1ge-6_0u4-bin-solaris-x64.tar.gz 

      Solaris (x64 platform) 64-bit binaries for Solaris 10 

      Gemm/tar/n1ge-6_0u4-gemm.tar.gz 

      Grid Engine Management Module, to be installed with SCS 2.2.1. 

      Windows/tar/n1ge-6_0u4-bin-win32-x86.tar.gz 

      Microsoft Windows (x86 platform 32-bit binaries for Windows 2000, XP and Windows Server 2003 

      Linux24_i586/tar/n1ge-6_0u4-bin-linux24-i586.tar.gz 

      Linux (x86 platform) binaries for the 2.4 kernel 

      Linux24_amd64/tar/n1ge-6_0u4-bin-linux24-amd64.tar.gz 

      Linux (AMD platform) binaries for the 2.4 kernel 

      MacOSX/tar/n1ge-6_0u4-bin-darwin.tar.gz 

      Apple Mac OS/X 

      HPUX11/tar/n1ge-6_0u4-bin-hp11.tar.gz 

      Hewlitt Packard HP-UX 11 

      Aix51/tar/n1ge-6_0u4-bin-irix65.tar.gz 

      SGI Irix 6.5 

      Aix43/tar/n1ge-6_0u4-bin-aix51.tar.gz 

      IBM AIX 5.1  

      Irix65/tar/n1ge-6_0u4-bin-aix43.tar.gz 

      IBM AIX 4.3  

      Type the following commands at the command prompt. In the example, basedir is the abbreviation for the full directory, cdrom_mount_point/N1_Grid_Engine_6u4.


      % su 
      # cd sge-root
      # gzip -dc basedir/Common/tar/n1ge-6_0u4-common.tar.gz | tar xvpf -
      # gzip -dc basedir/Docs/tar/n1ge-6_0u4-doc.tar.gz | tar xvpf -
      # gzip -dc basedir/Solaris_sparc/tar/n1ge-6_0u4-bin-solsparc32.tar.gz | tar xvpf -
      # gzip -dc basedir/Solaris_sparc/tar/n1ge-6_0u4-bin-solsparc64.tar.gz | tar xvpf -
      # SGE_ROOT=sge-root; export SGE_ROOT
      # util/setfileperm.sh $SGE_ROOT