Installing a Grid Engine 6.1 Update Release ------------------------------------------- 1) Who needs to read this document 2) Prerequisites 3) Stopping the Grid Engine cluster to prevent start of jobs 4) Shutting down the Grid Engine qmaster, scheduler and execution daemons 5) Installing the patch and restarting the software 6) Restarting the software 7) New functionality delivered with SGE 6.1 update releases 7.1) New "qsub -terse" option (6.1u1) 7.2) New reporting parameter for consumable resources control (6.1u3) 1) Who needs to read this document ---------------------------------- This document describes how to install a Grid Engine 6.1 update release from a previous 6.1 revision. If you make a new installation of Grid Engine 6.1 or a newer update release please follow the directions in the N1 Grid Engine manuals which can be found at http://docs.sun.com. The terms "update release", "patch", "patch release", and "distribution" in this document are used interchangeably and refer to the most recent courtesy distribution of Grid Engine which includes binaries, documentation and architecture independent files (the "common" package) available on the binary download page. 2) Prerequisites ----------------- The courtesy binary distribution always contains a full set of all binaries. These installation instructions assume that you are running a homogenous Grid Engine cluster (called "the software") where all hosts share the same directory for the binaries. If you are running the software in a heterogenous environment (mix of different binary architectures), you need to apply the patch installation for all binary architectures as well as the "common" and "arco" packages. If you installed the software on local filesystems, you need to install all relevant patches on all hosts where you installed the software locally. By default, there should by no running jobs when the patch is installed. There may pending batch jobs, but no pending interactive jobs (qrsh, qmake, qsh, qtcsh). It is possible to install the patch with running batch jobs. To avoid a failure (and possible loss of your jobs) of a running 'sge_shepherd' process, it is necessary to move the old sge_shepherd binary (and copy it back prior to the installation of the patch). You cannot install the patch with running interactive jobs, 'qmake' jobs or with running parallel jobs which use the tight integration support (control_slaves=true in PE configuration is set). It is required to update all binaries and the "common" package to the same revision level. A mix if different versions of Grid Engine daemons and commands is not supported. 3. Stopping the Grid Engine cluster to prevent start of jobs ------------------------------------------------------------ Disable all queues so that no new jobs are started: # qmod -d '*' Optional (only needed if there are running jobs which should continue to run when the patch is installed): # cd $SGE_ROOT/bin # mv /sge_shepherd /sge_shepherd.sge61 It is important that the binary is moved with the "mv" command. It should not be copied because this could cause the crash of an active shepherd process which is currently running job when the patch is installed. 4. Shutting down the Grid Engine qmaster, scheduler and execution daemons ------------------------------------------------------------------------- You need to shutdown (and restart) the qmaster and scheduler daemon and all running execution daemons. Shutdown all your execution hosts. Login to all your execution hosts and stop the execution daemons: # /etc/init.d/sgeexecd softstop Then login to your qmaster machine and stop qmaster and scheduler: # /etc/init.d/sgemaster stop Now verify with the 'ps' command that all Grid Engine daemons on all hosts are stopped. If you decided to rename the 'sge_shepherd' binary so that running jobs can continue to run during the patch installation, you must not kill the 'sge_shepherd' binary (process). 5. Installing the patch and restarting the software --------------------------------------------------- Now install the update release by unpacking all 'tar.gz' files (all binary packages and the "common" package). 6. Restarting the software -------------------------- Please login to your qmaster machine and execution hosts and enter: # /etc/init.d/sgemaster etc/init.d/sgeexecd After restarting the software, you may again enable your queues: # qmod -e '*' If you renamed the shepherd binary, you may safely delete the old binary when all jobs which where running prior the patch installation have finished. 7. New functionality delivered with SGE 6.1 update releases ----------------------------------------------------------- 7.1. New "qsub -terse" option (6.1u1) ------------------------------------- The new qsub option "-terse" outputs only the job id (and task id for array jobs) while submitting a job using the qsub interface. This is helpful for those who need just the job id returned by qsub. 7.2. New reporting parameter for consumable resources control ------------------------------------------------------------- The new reporting parameter "log_consumables" controls writing of consumable resources to the reporting file. Default (log_consumables=true) is to write information about all consumable resources (their current usage and their capacity) to the reporting file, whenever a consumable resource changes either in definition, or in capacity, or when the usage of a consumable resource changes. When log_consumables is set to false, only those variables will be written to the reporting file, that are configured in the report_variables in the exec host configuration, see host_conf(5) for further information about report_variables. The default (log_consumables=true) has been chosen to be backward compatible to 6.1u2, but it is recommended to switch to log_consumables=false, and add the required consumables to the report_variables in the global host (qconf -me global).