Login | Register
Login | Register

My pages Projects SunSource.net openCollabNet

Rotating and truncating Sun Grid Engine Log Files

Contents

  1. Overview

  2. Variables

  3. Command line parameters

  4. Examples


Overview

Sun Grid Engine daemons create log files called "messages" in their respective spool directories. Also, an 'accounting' file and a 'statistics' file is created. A script for truncating log files is found in the following directory:

$SGE_ROOT/util/logchecker.sh

The script is not activated by any of the Sun Grid Engine daemons automatically. It is intended to be edited according to the needs of your site. After customizing the script, you can add an entry to your crontab. The script is can run in verbose mode or completely silently. It can also run in a mode where it only prints what would be done. The script accepts only two command line parameters for overriding the ACTION_ON parameter and the location of the exec daemon spool directory (see below).

Sun Grid Engine Software daemons create log files in the qmaster_spool_dir and execd_spool_dir which are defined in the global cluster configuration, the can be overridden in the local cluster configuration of every execution host (usually this is not done). The directory is usually called 'default', and only if the $SGE_CELL variable is used, 'default' is overridden.

Default location of Sun Grid Engine log files:

<qmaster_spool_dir>/messages
<qmaster_spool_dir>/schedd/messages
<execd_spool_dir>/<hostname>/messages
<sge_root>/<sge_cell>/common/accounting
<sge_root>/<sge_cell>/common/statistics
  

Since these directories can all be located in the same directory hierarchy in a shared NFS filesystem, or the execd spool directories can point to a local directory, it is possible to specify with the ACTION_ON parameter (see below) which 'messages' files should be rotated when the script is called.


Variables

The following variables need to be configured in the script. The "|" character specifies an alternative. All variables in the script must be entered in Bourne shell syntax. So there may be no white space before or after the equal "=" sign.

  • UNCONFIGURED=yes|no

    After the script is configured you should set this values to "yes". If set to "no" (or any other value), the script only will print out what would be done.

    Default: UNCONFIGURED=yes

  • ECHO=:|echo

    The colon ":" is the null command in the shell. If you set the variable to this value, the script will work silently (only error messages are printed). If you set the value to "echo" the script will print what it is currently doing.

    Default: ECHO=echo

  • SGE_ROOT=

    Enter the path of your sge_ROOT directory here.

    Default: SGE_ROOT=undefined

  • SGE_CELL=default|

    Enter the name of your cell, if not 'default'

    Default: SGE_CELL=default

  • ACTION_ON=1|2|3|4

    1 = work on qmaster and scheduler "messages" files only
    2 = work on "messages" file on current host only
    3 = work on all accessible execd "messages" files of global config
    4 = work on qmaster "messages" and all accessible execd "messages" files

    Default: ACTION_ON=4

  • ACTIONSIZE=

    rotate/delete only if file size exceeds ACTIONSIZE in kilobytes. If ACTIONSIZE is set to 0, rotate "messages" file each time script is called.

    Default: ACTIONSIZE=0

  • KEEPOLD=

    Defines the number of old messages files to be preserved. E.g. "30" means that "messages.0" to "messages.29" are saved. A value of "0" means no backup is done. The most recent messages file has the extension ".0".

    Default: KEEPOLD=30

  • GZIP=yes|no

    yes = compress rotated "messages.0" file with gzip
    no = leave rotated "messages.0" file uncompressed

    Default: GZIP=yes

  • ACCT=yes|no

    yes = rotate accounting file when rotating qmaster 'messages' file
    no = don't rotate accounting file

    Default: ACCT=no

  • STAT=delete|yes|no

    delete = delete statistics file
    yes = rotate statistics file
    no = don't rotate statistics file

    The 'statistics' file in this release is not used. You can safely delete it. you can also set the the parameter stat_log_time in your global cluster configuration to a very long interval (default is 48:00:00 - 48 hours)

    Default: STAT=delete


Command line parameters

The script accepts the following command line parameters:

  • -execd_spool

    Define the base directory of the execd spool directory. Do not add the unqualified hostname in the command line. The hostname is added automatically by the script.

  • -action_on 1|2|3|4

    Override the ACTION_ON variable in the script.


Examples

  1. All Sun Grid Engine spool directories are shared. You can call the script on any one of your Sun Grid Engine hosts or on your file server.


    set ACTION_ON to "4" in the script. Set other values according to your needs and add the script to your crontab of one of the above machines.
  2. Sun Grid Engine execd spool are defined only through the global cluster configuration, but point to a local directory.


    set ACTION_ON="3". Add the start of the script to all crontabs of your execds in your cluster. On your qmaster machine (or on your file server) add the following call of the script to your crontab:
       <path_to_script>/logchecker.sh -action_on 1
  3. Sun Grid Engine spool directories of execds are defined in the local configuration.

    Set ACTION_ON="2" in the script:

    On your qmaster machine (or on your file server) add the following call of the script to your crontab:
       <path_to_script>/logchecker.sh -action_on 1
    On your exec hosts add the following line:
       <path_to_script>/logchecker.sh -execd_spool