|
To install Grid Engine and test the correct setup
of its functionality proceed with the following tasks:
General Overview
Create
a Grid Engine administrator account and set up a service port
An administrator account must be specified. The administrator can be an
existing user or a new user may be created for this task. This account
will own all of the files and it is used to configure and maintain the
cluster once the software is installed.
The administrator account must exist prior to installation. We recommend
'sgeadmin' as the administrator account belonging to the 'adm'
group.
The software uses a tcp port for communication. All hosts in the cluster
must use the same port number. The port number can be set in the following
places:
-
NIS (Yellow Pages) services or NIS+ database
Add the following to the services database (the port number does not
matter, it must be unused on your system and should be a reserved port)
sge_commd 536/tcp
# communication port for Grid Engine
-
Or, add the above manually to the /etc/services file on each machine
Create a directory
and unpack the distribution
As the Grid Engine administrator, do the following:
If you received the distribution in "pkgadd" format% mkdir <your_gridengine_root_directory>
Install the packages with "pkgadd" on your file server (all
files will have the correct permissions and ownership)
or if you received the distribution in "tar.gz" format
-
Create a directory for Grid Engine. This directory must be accessible to
all Grid Engine clients and execution hosts.
(e.g. /share/gridengine )
% mkdir <sge_root>
% cd <sge_root>
-
Unpack the distribution to this directory.
% gzip -dc sge_<version>_common.tar.gz | tar xvf -
% gzip -dc sge_<version>_<arch>.tar.gz | tar xvf -
(repeat for all architectures you need)
-
Please verify the file permissions with the script
<sge_root>/util/setfileperm.sh
(all Grid Engine directories and files should be owned by the administrator,
some files need to be installed suid root)
This script must run on a machine where user root has appropriate permissons
to chown/chmod file.
This script not necessarily need to run on the qmaster machine.
Additional information
before installing
-
Grid Engine must be installed as root
The Grid Engine installation program needs to be run as root in order
to start the daemons. Root does NOT need write permission on the fileserver.
Once Grid Engine is installed, the administrator can handle all day to
day operations.
-
Machine rebooting
The machines DO NOT need to be rebooted as part of the Grid Engine
installation.
-
It may be more convenient to have a file with the list of hosts that will
be installed. The format for this file is one hostname per line. The names
may also be typed in manually when the installation prompts.
-
If any stty commands exist in the users' startup scripts, jobs submitted
to Grid Engine may fail as there is no terminal associated with a Grid
Engine batch job. If there are stty commands, one of the following must
be done:
-
Remove all stty commands (and commands accessing a tty, like e.g. "biff")
from the login files
-
Bracket the stty commands with an 'if' statement which checks for a terminal
before executing. For example:
#!/bin/csh
tty -s
# checks terminal status
if ($status == 0) # succeeds if a terminal is present
<place all stty commands in here>
endif
Install Grid Engine
The installation is a two step process. First, the Grid Engine files are
installed and configured on the master. Then, a small installation is
done on each execution host to configure and start the daemons, and to
add automatic daemon startup to the init area. This requires logging on
to each execution host as root and manually running the install program.
Alternatively, if there is a secure machine with root rsh access to all
machines, the execution host install can be done from a single machine.
-
Step One - Install the master host
As root, on the master host, run:
% ./install_qmaster
(This is a shortcut for ./inst-sge -fast -m)
This will install the Grid Engine master.
-
Step Two - Install execution hosts
As root on the execution host machines, run:
% ./install_execd
(This is a shortcut for ./inst-sge -fast -x)
This will install the Grid Engine execution daemon.
The installation programs start the Grid Engine daemons, so at the completion
of a successful install, Grid Engine is up and running. If the
master host will also be an execution host execute Step Two also on the
master machine.
Verify installation
After the installation is completed, the installation can be verified.
There are some sample scripts in $SGE_ROOT/examples/jobs.
First source the proper settings file to setup the Grid Engine environment:
-
C-shell
% source $SGE_ROOT/default/common/settings.csh
-
Bourne shell
$ . $SGE_ROOT/default/common/settings.sh
Then, to verify Grid Engine is accepting jobs, execute the following:
% qsub $SGE_ROOT/examples/jobs/sleeper.sh
You should see output similar to the following:
% qsub $SGE_ROOT/examples/jobs/sleeper.sh
your job 1 ("Sleeper") has been submitted
Verify that all of the queues have been installed properly by running the
following:
% qstat -f (full listing of the queues)
Using Grid Engine
The main submit commands are qsub, qrsh and qtcsh. See the man pages for
submit(1) and qtcsh(1) for more details.
-
qsub
In general, qsub is used for traditional batch submit, that is where
I/O is directed to a file. Note that qsub only accepts shell scripts, not
executable files. There is an application script, qs, which will allow
qsub to accept executable files directly.
-
qrsh
Qrsh acts similar to the rsh command, except that a host name is not
given. Instead, a shell script or an executable file is run, potentially
on any node in the cluster. I/O is directed back to the submitter's terminal
window. By default if the job cannot be run immediately, qrsh will not
queue the job. Using the '-now no' flag to qrsh will allow jobs to queue.
Note that I/O can be redirected with the shell redirect operators. For
example, to run the uname -a command:
% qrsh uname -a
The uname of some machine the scheduler selects in the cluster will
then be displayed on the submitting terminal. To redirect the output,
% qrsh uname -a > /tmp/myfile
The output from uname will be written to /tmp/myfile on the submitting
host. To allow the command to queue:
% qrsh -now no uname -a
If a suitable host is not immediately available the command will block
until a suitable host is available. At that time, the command output will
be displayed on the submitting terminal. See the qrsh(1) man page for more
details.
-
qtcsh
Grid Engine contains a modified tcsh, qtcsh which will automatically
submit jobs listed in a task file to the cluster. See the qtcsh(1) man
page for more details.
|