|
|
Grid Engine Trouble ShootingProblem with a pending jobs not being dispatchedSometimes a pending job is obviously runnable, but does not get dispatched. Grid Engine can be asked for the reason:
This information is generated directly by Schedd and takes the current utilization of the cluster into account. Sometimes this is not exactly what you are interested in: E.g. if all queue slots are already occupied by jobs of other users, no detailed message is generated for the job you are interested in.
Job or Queue goes in error state "E"Job or queue errors are indicated by an uppercase "E" in the qstat output. A job enters the error state when Grid Engine tried to execute a job in a queue, but it failed for a reason that is specific to the job. A queue enters the error state when Grid Engine tried to execute a job in a queue, but it failed for a reason that is specific to the queue. Grid Engine offers a set of possiblities for users and administrators to get diagnosis information in case of job execution errors. Since both the queue and the job error state result from a failed job execution the diagnosis possibilities are applicable to both types of error states:
Additional information can be sometimes found in the messages of the Execd where the job was started. Use qacct -j <jobid> to figure out the host where the job was started and search in $SGE_ROOT/default/spool/<host>/messagesfor the jobid. |
|
![]() |
By any use of this Website, you agree to be bound by these Policies and Terms of Use. |