Tag: Usage

Throttling execution of array job tasks

Posted by – December 2, 2009

I’ve long found that SGE users are perfectly willing to do the right thing when it comes to sharing a computing infrastructure among multiple competing workgroups. What has often been lacking have been SGE features accessible to non-admin users that empower users to have more control over how their jobs run and are prioritized.

A very common example of this is a situation where a user will say:

“I need to submit 100,000 jobs but I don’t want to totally take over the cluster and upset my coworkers – can I limit how many of my jobs run at any given time so that resources are left free for others?”

As a Grid Engine consultant, training and administrator I’ve personally felt that working with people wanting to be “good citizens” has sometimes been a challenge. Most of the common SGE methods for limiting or controlling job execution and policies are available only to users with SGE Administrator privileges. As nice as it is to handle one-off cluster resource allocation situations these sorts of requests can consume lots of admin time and can occasionally cause problems if people make SGE quota or scheduler changes without tight coordination and planning.

Well, it was undocumented in the initial release but ever since SGE version 6.2u4 people have had the ability to limit concurrent execution of tasks within array jobs that they submit. The syntax looks like:

$ qsub -t 1-20 -tc 5 test.sh
… where the “-tc” argument is new. The example above shows a 20-task array job being submitted with a request to run no more than 5 at any one time.

This feature is now documented as of SGE 6.2u5:


-tc max_running_tasks


allow users to limit concurrent array job task execution.

Parameter max_running_tasks specifies maximum number of simultaneously

running tasks. For example we have running SGE with 10 free slots. We

call qsub -t 1-100 -tc 2 jobscript. Then only 2 tasks will be

scheduled to run even when 8 slots are free.

This is a very welcome new feature addition to Grid Engine, I suspect it will be popular and well received by the user community.

one-liner for listing idle nodes

Posted by – April 5, 2009

Following-up to a previous utility script posting, the following one-liner from Tim Cera will also do the trick.

This script (and quite a few others) have been loaded up onto the Gridengine.info Wiki “Utilities” page:
http://wiki.gridengine.info/wiki/index.php/Utilities

#!/bin/sh
qstat -g c -l arch=lx24-amd64 -q all.q | awk 'NR > 2 {sum = sum + $4} END {print sum}'

Directing jobs to particular machines

Posted by – February 14, 2008

Andreas posted today a short usage tip for people who need to direct their jobs to particular named execution hosts. It covers the syntax for referencing an execution host name after a queue name or wildcard character:

You can do this with the -q option:

   -q "*@comp28,*@comp29"

or even shorter

  -q "*@comp28|comp29"

with the -l option you can specify the host(s) like this

  -l h=comp28|comp29

if you have a large cluster with high throughput I recommend
the -q being used since -q matchmaking is generally faster.

Note: One more additional syntax tip - remember your "@" vs. "@@" usage syntax:

  • -q all.q@comp28 -- refer to a specific host
  • -q all.q@@LinuxNodes -- refer to a specific hostgroup

Understanding queue error state ‘E’

Posted by – January 20, 2008

Working at my day job I usually handle SGE related questions from our customers and clients. This morning after responding to a support request concerning a SGE queue in state “E” I got curious and started trying to learn how often we had been asked this. It turns out that I’ve probably sent ~25-30 unique responses on this specific subject and each time my written response was different. This post is an attempt to create a single article that I can point people at as needed …

Seeing “E” in the state column of qstat?

E state errors usually mean that an attempt to start a job failed in a spectacular manner and the Grid Engine qmaster decided to close off the queue instance to new jobs.

This is an important Grid Engine protective measure designed to keep your remaining pending jobs from a “black hole” draining effect in which they all successively get dispatched to the “bad” node die instantly with errors.

There are different causes to state E — in most cases the root cause is is some large, systemic hardware or OS level error or misconfiguration. Typical examples include:

  • The username of the job submitter does not exist on the execution host (extremely common)
  • Shared filesystem failure
  • Parallel jobs: syntax errors or bad commands in “start_proc_args” or “stop_proc_args” as defined within the parallel environment (PE)
  • Serial jobs: syntax errors or a “prolog” or “epilog” script that does not exit with status code 0
  • Serious path or path_alias problems (paths that exist on the submit host are different on remote execution host or have been improperly aliased
  • Network, routing or DNS errors that are interfering with LDAP, NIS or DNS

I have seen a few cases of actual jobs crashing and causing queue instance state “E”. Usually this seems to occur when the job itself has crashed and taken out its parent process (the ‘sge_shepard’ deamon). If your job is bombing bad enough to wipe out the parent sge_shepard process then SGE will usually toggle the queue instance into “E” state. This is still a fairly rare occurance so if you are trying to debug this situation I’d recommend first looking at Hardware and OS level issues before looking too closely at the job as a root-cause.

State “E” does not go away automatically

One big message to impart is that E states are persistent and never go away on their own (unlike many SGE queue and job states which clear automatically). State “E” will persist through hardware reboots and Grid Engine restart efforts. The state has to be manually be cleared by a Grid Engine administrator. Again, the reason for this is that SGE wants a human to investigate the root cause first in case there is potential for the “black hole” effect mentioned above.

If you think this was a transient problem you can clear the queues and see what happens with your pending jobs — the command is “qmod -c (queue instance)”.

To globally clear all E states in your SGE cluster:

qmod -c '*'

Troubleshooting and Diagnosing

  • qstat -explain E
  • Examine the node itself and OS logs with an eye towards entries relating to permissions, failures or access errors
  • Try to login to the node in question using a username associated with a failed job. This will help diagnose any username, authentication or access issues
  • Look in the job output directory if it is available. Output from failed jobs can be extremely useful, especially if there is a path, ENV or permission problem
  • Examine the SGE logs with particular focus on the messages file created by the sge_exced on the execution host in question
  • If all else fails, SGE daemons will write log files to /tmp when they can’t write to their normal spool location. Seeing recent SGE event data in /tmp instead of your normal spool location is a good indication of filesystem or permission errors

I’ll try to keep this page updated in the future with new information and troubleshooting hints

New user contributed accounting script

Posted by – May 15, 2007

A new “pull statistics from the SGE accounting log file” script has been posted to the SGE community. Olivier Blondel took Joe Landman’s “usage.pl” script and modified it to suit his own needs. The script can be found embedded inline with Olivier’s post to the users mailing list.

Simple perl reporting tool for SGE accounting data

Posted by – October 11, 2006

Joe at Scalable Informatics is offering up a “quick -n- simple” reporting script for Grid Engine accounting and usage data.

Usage examples:

[landman@minicc ~]$ ./usage.pl
Total usage: (in units of second(s))
        wallclock  :       46733.000 second(s)
        user time  :        1600.000 second(s) [3.42%]
        system time:          17.000 second(s) [0.04%]
        cpu time   :       70379.000 second(s) [150.60%]

user            wallclock       user time       system time     cpu time
       memory          percent of total time
landman         46733.000       1600.000        17.000
70379.000       0.000           100.000

One day grid engine training seminar in Boston

Posted by – May 11, 2006

Time for a brief commercial announcement. This is part of an ongoing personal experiment to see if there actually is demand for user and usage-centric Grid Engine training.

A 1-day seminar on “Grid Engine 6 Intro & Usage” will be offered on June 2, 2006 in the Boston area.

Intended Audience:

Anyone interested in a user-centric view of distributed computing with Grid Engine. Note: this is not an advanced operator/admin course.

Our goal is to help users, application integrators and developers understand Grid Engine features and capabilities in a way that allows them to become more productive at home. New SGE cluster operators or administrators also may benefit from the user and usage-centric perspective.

The seminar will be taught with a life science informatics focus, using bioinformatics workflows and applications as examples. As Grid Engine usage and configuration patterns can differ significantly between disciplines and industries, interested non-life-science attendees should contact us in advance to determine if this seminar will be a good fit.

The full announcement is at http://bioteam.net/dag/gridengine-training/

DRMAA picks up Python language bindings

Posted by – October 27, 2005

Complimenting the DRMAA Perl binding from Tim Harsch is a new DRMAA Python module by Enrico Sirola. This has been discussed on the lists (1, 2) and in a post on Dan’s blog.

DanT writes about DRMAA

Posted by – October 25, 2005

Fresh from his move back to the USA, Dan has posted a couple of Sun blog entries on DRMAA and Grid Engine. DRMAA is a GGF API specification for “the submission and control of jobs to one or more Distributed Resource Management (DRM) systems“. It is currently well supported with Grid Engine 6 and it seems that folks are busy with getting other systems to support DRMAA 1.0

The first of two recent DRMAA posts is titled “Porting the DRMAA Java Language Binding“:

Dan says:“There’s a been quite a bit of talk on various aliases (and over private email) recently about what’s required to port the Grid Engine DRMAA JavaTM language binding to another DRM. Since that is an interesting topic, I figured I’d assemble all of the answers here for easy reference…(more) “

The second entry is titled “Running Job Scripts With DRMAA“, this topic has been popping up quite a bit on the Grid Engine lists recently (1, 2):

Dan says:DRMAA is intended as a general purpose API, which means it has to assume as little as possible about the jobs it runs. Grid Engine recognizes two broad classes of jobs: scripts and binaries. A script is a text file that is to be run by a shell and which may have embedded SGE options in it. (Lines starting with #$ are parsed by Grid Engine at job submission time for embedded options. See the man page for more info.) A binary is anything else. The user controls whether a job is treated as a binary or a script with the -b (for binary) qsub option. The Grid Engine default is to assume that all jobs are scripts.

DRMAA, however, makes a different assumption. The minimum assumption that DRMAA can make is that jobs are opaque and cannot be parsed, i.e. that all jobs are binary. This assumption is exactly the opposite of the one Grid Engine makes. Because jobs aren’t assumed to be scripts, there are a few extra steps required to running scripts through DRMAA…(more)

Appreciating grid engine ‘man’ pages

Posted by – September 22, 2005

As is the case with many open source efforts, a rapid pace of development can often outpace the formal documentation efforts. This makes the SGE “man” pages critical resources for savvy users and administrators.

Why SGE man pages are so important

  • The man page entries are written by the developers who wrote the code

  • The man pages are maintained in the same CVS repository as the active gridengine codebase making updates, additions and corrections a simple matter.

  • Nobody really knows how to update the official documentation or when a tech writer will be hired to revise it. Check out the amazing list of open documentation issues to see this for yourself.

  • What this means is that although the formal documentation is of high quality it will certainly lag behind the man pages when it comes to documenting new features, fixes and behaviors.

Links to the most current Grid Engine manpages

The following list of man page links point directly to the current Grid Engine source repository.

Click on any of them to see the latest and greatest documentation for the manpage in question.

Section 1

gethostbyaddr  
gethostbyname  
gethostname  
getservbyname  
hostnameutils  
qacct  
qalter  
qconf  
qdel  
qhold  
qhost  
qlogin  
qmake  
qmod  
qmon  
qping  
qresub  
qrls  
qrsh  
qselect  
qsh  
qstat  
qsub  
qtcsh  
sge_ckpt  
sge_intro  
sge_types  
sgepasswd  
submit  



Section 3

drmaa_allocate_job_template  
drmaa_attributes  
drmaa_control  
drmaa_delete_job_template  
drmaa_exit  
drmaa_get_DRM_system  
drmaa_get_attribute  
drmaa_get_attribute_names  
drmaa_get_contact  
drmaa_get_next_attr_name  
drmaa_get_next_attr_value  
drmaa_get_next_job_id  
drmaa_get_vector_attribute  
drmaa_get_vector_attribute_names  
drmaa_init  
drmaa_job_ps  
drmaa_jobcontrol  
drmaa_jobtemplate  
drmaa_misc  
drmaa_release_attr_names  
drmaa_release_attr_values  
drmaa_release_job_ids  
drmaa_run_bulk_jobs  
drmaa_run_job  
drmaa_session  
drmaa_set_attribute  
drmaa_set_vector_attribute  
drmaa_strerror  
drmaa_submit  
drmaa_synchronize  
drmaa_version  
drmaa_wait  
drmaa_wcoredump  
drmaa_wexitstatus  
drmaa_wifaborted  
drmaa_wifexited  
drmaa_wifsignaled  
drmaa_wtermsig  



Section 5

access_list  
accounting  
bootstrap  
calendar_conf  
checkpoint  
complex  
host_aliases  
host_conf  
hostgroup  
project  
qtask  
queue_conf  
reporting  
sched_conf  
sge_aliases  
sge_conf  
sge_pe  
sge_priority  
sge_qstat  
sge_request  
sgepasswd  
share_tree  
user  
usermapping  
sge_execd  
sge_qmaster  
sge_schedd  
sge_shadowd  
sge_shepherd