Clever job prioritization tip
Grid Engine has a built-in priority mechanism that is useful for allowing end users to sort and prioritize their own personal pending tasks -- this gives the users the ability to submit many jobs but still dictate which of those jobs need to be run more urgently than the rest.
In practice, though, this is actually fairly clunky to implement. By default the following conditions exist:
- SGE will accept a priority range of
-1023 to 1024 - By default all jobs get assigned a value of
0 - Only SGE managers can assign priority values higher than
0 - Normal users can only assign negative priority values
This is, ummmm, awkward to say the least and works in a way that is 100% opposite from what a sensible user or SGE Admin would expect. Users can only decrease the relative priority of their job in the default environment.
A recent mailing list post from Jeff highlights a nice little workaround. Jeff describes creating an entry in the sge_request file that automatically assigns a value of -p -100 to all submitted jobs that don't override the default with their own use of the -p switch.
This is a nice approach because by default it harms nobody (as all jobs have -p -100. Yet it gives headroom for a non privileged user to use the priority range -99 to 0 to designate some of her jobs as more personally important than others.
Background reference: manpage for sge_request.
Open Grid Forum (OGF22) Meeting Discount
The 22nd Open Grid Forum -- OGF22
Hyatt Regency Cambridge
Cambridge, MA USA
February 25-28, 2008
Website: http://www.ogf.org/OGF22/
Coworker Chris Dwan and I will be attending this event and one of us will likely end up speaking. Both of us are known to be somewhat cynical of the "big G" Grid Computing world so we'll be bringing our industry-centric views and bias towards practical solutions into the forum.
(I swear, when you speak to some of these "big G grid" supercomputing or academic folks you get the sense that they think that everyone has 100 million in government funding and a petabyte-scale single namespace storage solution to apply to the problems at hand....)
Among the other attendees that I know about, Chris Smith from Platform Computing will also be there -- he handles Platform's involvement with standards bodies and is another person on my "smart people that I learn a lot from" list. Should be an interesting event.
And finally, some discount registration offers for readers of this blog:
- "Buy one 1 pass and get a 2nd for free"
- "$150 discount off the purchase off the full day pass"
pharma when registering to get the special prices.
Clever urgency policy usage
It's mailing list posts like this that generate "aha!" moments for me where I realize that I've learned how to tweak SGE behavior in a new way.
Mark answered the original poster with a good suggestion for solving the particular issue at hand -- using qalter to change priority values so that a pending parallel job can rise to the top of the waitlist.
Then Mark offhandedly dropped this little comment:
... If you always want parallel jobs to go first, you can try increasing the urgency of the 'slots' complex.
I'm familiar with the Urgency Policy mechanism in Grid Engine. I've used it many times to address specific problems from a resource allocation perspective. Typically this involves something like using the urgency policy to prioritize the dispatch of pending jobs that consume expensive flexlm software license entitlements. I'm also aware from creating and modifying requestable and/or consumable resources that all of the resource attributes listed in the SGE complex have an urgency parameter associated with them that defaults to 0.
I just hadn't really put it all together until Mark's offhand aside. It's not complicated at all, just ... elegant. Associating urgency entitlements with the "slot" complex means that jobs that need more "slots" will gain additional entitlements and thus rise up through the pending list. Since parallel jobs naturally consume more slots than serial tasks, the end results is that parallel jobs become "more important" in the scheduler mechanism than non-parallel jobs.
I'm guessing not many people have a global "parallel jobs are always more important than serial jobs" use case requirement but for those that do this could be a neat trick.
Clever urgency policy usage
It's mailing list posts like this that generate "aha!" moments for me where I realize that I've learned how to tweak SGE behavior in a new way.
Mark answered the original poster with a good suggestion for solving the particular issue at hand -- using qalter to change priority values so that a pending parallel job can rise to the top of the waitlist.
Then Mark offhandedly dropped this little comment:
... If you always want parallel jobs to go first, you can try increasing the urgency of the 'slots' complex.
I'm familiar with the Urgency Policy mechanism in Grid Engine. I've used it many times to address specific problems from a resource allocation perspective. Typically this involves something like using the urgency policy to prioritize the dispatch of pending jobs that consume expensive flexlm software license entitlements. I'm also aware from creating and modifying requestable and/or consumable resources that all of the resource attributes listed in the SGE complex have an urgency parameter associated with them that defaults to 0.
I just hadn't really put it all together until Mark's offhand aside. It's not complicated at all, just ... elegant. Associating urgency entitlements with the "slot" complex means that jobs that need more "slots" will gain additional entitlements and thus rise up through the pending list. Since parallel jobs naturally consume more slots than serial tasks, the end results is that parallel jobs become "more important" in the scheduler mechanism than non-parallel jobs.
I'm guessing not many people have a global "parallel jobs are always more important than serial jobs" use case requirement but for those that do this could be a neat trick.
Extending job dependency scheduling to array job sub-tasks
More Rising Sun news ...
Rising Sun Pictures, an Australian visual effects house (previous mention) has released a specification document entitled "Grid Engine Array Task Dependency Specification"
The spec is well written and backwards compatibility is assured. The use cases come from digital film and frame rendering. The main goal is to extend the ability of the SGE scheduler to handle array job tasks that themselves may be dependent on the successful completion of other array jobs or even sub-tasks of other jobs.
The full specification is here and well worth a read:
http://open.rsp.com.au/?page_id=11
Project Hedeby documentation draft now available
How Hedeby is being introduced:
In large enterprises, hosts are often divided among different services (e.g. N1GE), and the services themselves are seen as assigned pools of resources (e.g. hosts). When a service is overwhelmed with work one solution may be to remove resources from a service which is not overburdened or less important and assign those resources to the overloaded service. The Hedeby project was established to provide this functionality automatically... (http://hedeby.sunsource.net/)
As reported in this mailing list thread, a first draft version of a Hedeby documentation book has been committed to the project's CVS repository. The book has been transformed and made available as a PDF by an interested member of the SGE community.
Fred Youhanaie found the book and was able to successfully transform the Docbook XML into PDF form. The transformed PDF is available at http://www.anydata.co.uk/gridengine/HedebyBook.pdf
The Hedeby developers may not be incredibly pleased to see a first-draft, first-commit documentation effort grabbed from CVS and instantly made available as PDF so some some standard warnings and caveats should apply. The only people who should check this PDF out are people interested in what Hedeby is, how it is being architected and what some of the first initial use cases are envisioned to be. All other non or semi-interested parties should just relax, sit back and let Hedeby development continue until something is actually officially released.
Project Hedeby documentation draft now available
How Hedeby is being introduced:
In large enterprises, hosts are often divided among different services (e.g. N1GE), and the services themselves are seen as assigned pools of resources (e.g. hosts). When a service is overwhelmed with work one solution may be to remove resources from a service which is not overburdened or less important and assign those resources to the overloaded service. The Hedeby project was established to provide this functionality automatically... (http://hedeby.sunsource.net/)
As reported in this mailing list thread, a first draft version of a Hedeby documentation book has been committed to the project's CVS repository. The book has been transformed and made available as a PDF by an interested member of the SGE community.
Fred Youhanaie found the book and was able to successfully transform the Docbook XML into PDF form. The transformed PDF is available at http://www.anydata.co.uk/gridengine/HedebyBook.pdf
The Hedeby developers may not be incredibly pleased to see a first-draft, first-commit documentation effort grabbed from CVS and instantly made available as PDF so some some standard warnings and caveats should apply. The only people who should check this PDF out are people interested in what Hedeby is, how it is being architected and what some of the first initial use cases are envisioned to be. All other non or semi-interested parties should just relax, sit back and let Hedeby development continue until something is actually officially released.
Dan's video intro to Grid Engine Service Domain Management
Rayson pointed out the following Blog post this morning:
http://blogs.sun.com/HPC/entry/video_sun_grid_engine_demo
Which contains the following great YouTube video of DanT:
If the embedded link does not work, try this:
http://www.youtube.com/watch?v=8QB96lALa5I
Detailed docs on Service Domains and Grid Engine are hard to find. The topic is mentioned a bit in this prior blog post: http://gridengine.info/articles/2006/12/13/its-official-project-hedeby-and-arco-join-the-sge-codebase
Parallel Environment Queue Sort API
Is anyone using this?
While trying to prune down an overflowing email inbox, I stumbled upon a mailing list post from back in May 2006 that I had tagged as something to follow up upon. The post to the developers mailing list asked about a scheduling API for Grid Engine. One of the replies mentioned that the "Parallel Environment Queue Sort (PQS) API" had been checked into the CVS maintrunk but was not on by default.
This API exists and is apparently only documented in the following SGE source file:
source/libs/sched/sge_pqs_api.h
The API seems to provide the hooks necessary for someone to compile his or her own loadable module that can be installed in the $SGE_ROOT/lib/<arch>/ directory. One loaded, the custom code can make the final decision (based on a list of supplied candidates) as to the hosts and queue instances used for a particular parallel job.
People interested in this should read the sge_pqs_abi.h file carefully as there are many caveats and warnings. I'd be interested in hearing from anyone using this API as well.
Help shape Advanced Reservation functionality for SGE-6.2
If you are at all interested in the topic of Advanced Reservation scheduling within Grid Engine, then please take the time to look at (and comment upon) the following draft functional specification document:
Functional Specification Document for 6.2 Advance Reservation
Comments and feedback should be sent to the Developer mailing list. A thread has already been started.
Two new qmon enhancements coming in 6.0u10
With patches supplied by Hin-Tak Leung (more on Hin-Tak in a later article), the following useful enhancements to the X11 'qmon' binary have been added to the CVS repository for inclusion in the next 6.0u10 release:
- Issue #:721 -- Custom column widths for qmon job control pane
- Issue #:2126 -- New 'qhost'-like details in qmon cluster queue pane [screenshot]
Custom Widths
The first screenshot shows the default layout for the Qmon Job Control pane. The 2nd screenshot shows the new column sizing and layout customized by altering options listed in a personal ~/Qmon preference file. In the cusomized layout, the job name field has been greatly expanded and the Job ID column width has been slightly decreased. A new sliding bar allows access to the columns that can not be displayed with the pane.
Adding Host details to the Cluster Queue pane
The first screenshot shows the default layout for the Qmon Cluster Queue pane, note that there are only two tabs available within the pane: "Cluster Queues" and "Queue Instances". The second screenshot shows the activation of a third tab named "Hosts"
Read the full article for details on how to activate these changes which are disabled by default ...
Details: How to enable these changes
Both enhancements are controlled by per-user ~/Qmon preference files. To customize the column widths in the Job Control pane, use and adjust the following settings:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! job configuration settings !! Qmon*job_form*columnWidths: nr of characters per column for !! the first 6 cols !! Qmon*job_form*visibleColumns: nr of visible columns (without scrollbar) !! if the column sizes shall be bigger this can !! be lowered to show only the first n cols and !! the rest can be reached with the horizontal !! scrollbar !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Qmon*job_form*columnWidths: 12,8,10,10,7,16 Qmon*job_form*visibleColumns: 6
To enable the additional Host tab within the Cluster Queue pane, add the following details to your ~/Qmon preference file, changing the values from FALSE to TRUE:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! show the Host tab in Queue Configuration !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Qmon*showHostTab: FALSE Qmon*automaticUpdateHostTab: FALSE
Enhanced dynamic limits in the new resource quota system
A bit of interesting news via the GE issues mailing list recently concerning the newly announced "Resource Quota" feature that will be part of the upcoming Grid Engine 6.1 release. The specification document for the new Resource Quota facility makes specific mention of "dynamical limits". The specific example of a given "dynamical limit" is the following:
limit hosts {@linux_hosts} to slots=$num_proc*5
... that limit would change from machine to machine depending on the number of CPUs resident in each machine. Useful.
Roland filed (and then fixed!) a new issue asking for this functionality to be extended to allow the following types of usage:
'slots=$num_proc*2-1' or slots=$num_proc*2+2'
The new enhancements extend the operators that can be used for defining these new limits. This enhancement also applies to load_formula syntax as well due to a shared codebase. The new syntax definition looks like this:
{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]]
It's official: 6.1 snapshot is out; major new enhancements
Highlights:
- Preview release only, test carefully before even remotely considering production use
- A tentative beta release of SGE 6.1 is scheduled for February 2007
- No official data for full 6.1 release; official release may have additional features or components
- A HUGE milestone with major new functionality
The most exciting new feature is a MAJOR step forward for the project and the product - a flexible system for implementing Resource Quotas. This feature is being developed to address at a minimum some of the biggest and most vexing configuration limitations encountered by the user community:
- Issue #: 74: -- Supporting maxujobs on a per host level
- Issue #: 1532: -- Allowing "max jobs per user" limits on a per queue basis
- Issue #: 1644: -- Allowing per-user slot limits to be set within parallel environments (PE's).
Other additions to the 6.1 snapshot include:
- Official support for Mac OS X on Intel and Linux on Itanium
- ARCo joins the codebase (as reported previously)
- The PDC patches supplied by the user community were accepted and now allow for better usage data collection on Apple Mac OS/X, IBM AIX and HP HP/UX
- Helpful scripts and documentation for Solaris 10 users wishing to use the amazing DTrace tool for bottleneck identification and tuning
Advanced Reservation plugin for Grid Engine
Yoshio Tanaka posts the following:
... We are pleased to announce that advance-reservation plugin module called PluS version 1.0.0 RC 1 is now available for download at the PluS home page at: http://www.g-lambda.net/plus/ . PluS (Plug-in Advance Reservation Manager for Torque and Grid Engine) adds an advance-reservation function to Torque and Grid Engine. For SGE, one of the following operations will be performed based on the startup option. (1) SGE queue base version - The SGE schedule is not replaced, and the reservation function is realized simply by managing the reservation queues. (2) SGE self scheduling version - The original SGE scheduler is replaced by the PluS SGE scheduler which realizes the reservation management function and the job scheduling function. ...
The package is released under the Apache 2 License. It appears that the system has mainly been developed and tested on the following configuration: Linux 2.6.x, Intel x86, glibc 2.3.3, SGE 6.0u8
The HTML version of the PluS Manual is online here:
http://www.g-lambda.net/plus/wp-content/uploads/2006/10/manual.html.
The http://www.g-lambda.net/plus/ site contains a link to a PDF from a IEEE conference paper covering the system in more technical detail.
Resource Reservation vs Backfilling
A list message posted by Andreas back in June has a link to an overlooked yet quite interesting Grid Engine Design document. It includes the following definition of terms:
Resource Reservation
A job-specific reservation created by the scheduler for pending
jobs. During the reservation the resources are blocked for lower
priority jobs.
Backfilling
The process of starting jobs of the job priority list despite of
higher priority pending jobs that might own a future reservation
with the same resource. Thus backfilling has a meaning only in the
context of Resource Reservation or Advance Reservation.
Advance Reservation
A reservation (possibly independent of a particular job) that can
be requested by a user or administrator and gets created by the
scheduler. The reservation causes the associated resources be blocked
for other jobs.
Preemption
The process of interrupting job executions in order to free resources
for particular jobs.
… good terms to know, especially when reading through the SGE docs and mailing list messages. The entire document makes for interesting reading.
Older posts: 1 2






XML Feeds