Announcing Grid Engine 6.0 Scalability Update 2 Snapshot 1
Regensburg, Germany -- Sep 19, 2005
Grid Engine 6.0 Scalability Update 2 Snapshot 1 is now ready and courtesy binaries are
available for download. The SGE 6.0s2snapshot1 release includes performance improvents
and fixes bugs of the software, installation procedure and man pages. This release is aimed
for testing purposes only and not for productive clusters.
The major scalability improvements are:
- qmaster GDI GET request handling (e.g. qstat) is now faster and consumes less memory.
- async communication protocol between scheduler and master. This leads to a more
efficient communication flow.
- better commlib thread concurrency through introduction of RW-locks.
- significantly enhanced DRMAA submission rate.
Some important fixes included in this release are:
- Issue 1628 - Problem with large job output files on 32-bit Linux nodes
- Issue 1665 - delivery of queue based signals to execd repeated endlessly
- Issue 1780 - qconf -mq disallows 2057 hostspecific profiles in slots configuration
- Issue 1787 - calendar syntax "week mon=0-21" corrupts SGE and may crash qmaster
A complete list of fixed problems is available at
This snapshot does not yet include all bugfixes planned for the final scalability update
release. However development tasks for the second scalability update are finished and
orderly tested.
The following is a list of remaining bugs supposed to be fixed in the final "6.0s2" version:
- Issue 1750 - accounting(5) record can't be made available immediately after job finish
- Issue 1798 - qconf -mattr can crash qmaster
- Issue 1640 - qconf -[dm]attr gets confused by shortcuts
- Issue 1615 - sge_qmaster abort with "lGetList(): got NULL element for SME_message_list"
- Issue 1760 - unable to delete a configuration of a non existing host
- Issue 1799 - qmaster messages error logging upon subordinated queue is removed
- Issue 1652 - getting many E messages "failed building category string for job N"
- Issue 1761 - Releasing consumable increases consumable count
- Issue 1768 - Quotes in native specification can result in memory corruption
- Issue 1709 - drmaa_syncronize() returns DRMAA_ERRNO_SUCCESS for jobs outside the current session
- Issue 1485 - drmaa_job_ps() does not work for jobs submitted outside of the current session
- Issue 1800 - qstat -s p doesn't show pending array tasks while there are tasks of this job running
- Issue 1686 - qacct -o -D output hard to parse
- Issue 1801 - confusing execd startup messages and delays in case of problems
- Issue 1802 - CSP consolidate error output if cert CA on client and server don't match
- Issue 1517 - qmaster is not accepting connections if number of execd's exceed number of file descriptors
- Issue 1772 - shepherd doesn't handle qrlogin/qrsh jobs correctly
- Issue 1803 - Binary jobs are problematic for starter and epilog scripts
- Issue 1804 - queues are wrongly in error state
- Issue 1634 - Suspend/Resume Problems on RedHat 3.0
- Issue 146 - Failed migrate command leaves job running
- Issue 1406 - Cannot submit to HPUX client using qrsh with no command
- Issue 1695 - default PATH variable set for job insuffficient for non-login shell jobs
- Issue 1681 - killed master task with tight integration does not kill slave jobs in special case
- Issue 1679 - tight integration - qrsh_exit_code file not written
- Issue 1680 - admin mail information is incorrect or queue error state setting does not work
- Issue 1751 - use of the same pathes for input/output stream must be dealt with
The courtesy binaries are available at:
The patch installation notes are available at
Please test this release and send support questions and feedback to the "users" mailing of
the Grid Engine open source project.
The corresponding source code tag in the CVS repository has the name
V60s2snapshot1_TAG
A snapshot of the sources is available at the
Documents & files page: