SGE 6.1u5 update is out
Update release SGE 6.1u5 is out today, the announcement can be read at http://gridengine.sunsource.net/news/GE61u5-announce.html and the list of bugs fixed since the last release in the 6.1 series can be found here: http://gridengine.sunsource.net/project/gridengine/61patches.txt.
Screencast showing online upgrade to SGE 6.2
Lubomir Petrik has posted a screencast recording showing the SGE 6.x to SGE 6.2 upgrade process. Thanks to Andy for finding and reporting this.
T-Shirt Contest
Want a T-shirt? Be quick and email Andy. Details below.
Do you want to win a truly nice open source T-Shirt?
There are T-shirts to win in three categories:
1. Among the first 50 of you who reply *directly* to me (andy.schwierskott@sun.com) and tell us what is the single most important or interesting feature in SGE 6.2 for you, we'll draw three T-shirts.
2. Three T-Shirts goes to those persons who first report that they have upgraded their production cluster to SGE 6.2. Test-beds, eval clusters, private use doesn't count.
3. Three T-Shirts goes to those persons who will be using SGE for the first time, be it because you replace another DRM system or be it because you start using a DRM system for the first time. Requirements: it must be SGE 6.2 and it must be production use, not just a test-bed, private use or eval cluster.
We'll respect your privacy and only make your name public if you agree to it! Sun Microsystems employees may not participate.
Please feel free to populate this announcement and 'lottery' to mailing lists who take care about the SGE technology.
Regards, Andy
6.2 Officially Out
Grid Engine 6.2 is officially out, follow the links in the blog post below to read DanT's excellent set of articles on "why upgrade to 6.2?".
Get it here:
http://www.sun.com/software/gridware/
This also marks the official transition to having all of the Sun SGE documentation and manuals in wiki form:
http://wikis.sun.com/display/GridEngine/Grid+Engine
SGE 6.2 Coming August 5th
Unofficial word is that the official release of SGE 6.2 is coming on Tuesday, August 5th.
To read up on why this is news, check out Dan's excellent essays:
Why upgrade? DanT explains SGE from 5.x through 6.2 and beyond
Dan has posted a great overview of how Grid Engine has changed since the version 5.x days, couched in the context of answering the "Why should I upgrade SGE?" questions that often come up.
I won't even excerpt it, the full article is well worth a read:
http://blogs.sun.com/templedf/entry/why_upgrade
Feedback needed: Obsolete options and parameters considered for removal
Grid Engine developers posted a list today of SGE configuration parameters and client arguments that are being considered for removal from the product because they are either obsolete or they duplicate settings found elsewhere.
The developers are seeking feedback and comments on their plans - if you have any please drop a line to the users@gridengine.sunsource.net mailing list. The current roadmap calls for these methods to be marked as 'deprecated' in the SGE 6.2 release with total removal planned for a future post-6.2 release.
The message can be found here:
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25045
The full list of items being considered for removal can also be found after the jump ...
The parameters planned to obsolete are: host_conf(5) - processors obsolete, same as num_proc from the complex list sched_conf(5) - algorithm just default is allowed, no additional algorithms are planed - params JC_FILTER huge performance impact plus may lead to wrong scheduling decisions sge_conf(5) - reprioritize redundant because hard bound to reprioritize_interval in sched_conf(5) - shell_start_mode obsolete, value from queue_conf(5) is used - set_token_cmd no known AFS support - pag_cmd no known AFS support - token_extend_time no known AFS support - qmaster_params DISABLE_AUTO_RESCHEDULING equivalent to default reschedule_unknown=0:0:0 - qmaster_params merge ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE We can't imaging a use case to have these values separated - finished_jobs qstat -j does not work with successful finished jobs. Code seems to work only with jobs going into error state. user(5) - delete_time change to internal, not changeable/visible field Implicit set by auto_user_delete_time qconf(1) - sep option obsolete, same as num_proc - ks option obsolete, same as -kt scheduler qmod(1) - c option depreciated, use -cj or -cq - r option depreciated, use -rj or -rq - s option depreciated, use -sj or -sq - us option depreciated, use -usj or -usq
SGE 6.2 beta 2 is out
6.2b2 came out yesterday:
http://gridengine.sunsource.net/news/GE62beta2-announce.html
The list of bug fixes made since SGE 6.2 Beta 1 is online at http://gridengine.sunsource.net/project/gridengine/62patches.txt.
This is the latest beta release of SGE 6.2 and we really need more eyeballs and testers on this release to flesh out any remaining issues before 6.2 goes officially out the door.
There are some differences in 6.2 both in the install procedure as well as the daemons (sge_schedd is gone! -- It's now a thread within sge_qmaster). I posted a screencast recording of the SGE 6.2 Beta 1 installation a while back: http://gridengine.info/articles/2008/05/16/screencast-live-install-of-sge6-2-beta for those that may be interested in watching what the new install process looks like.
SGE 6.2 beta 2 is out
6.2b2 came out yesterday:
http://gridengine.sunsource.net/news/GE62beta2-announce.html
The list of bug fixes made since SGE 6.2 Beta 1 is online at http://gridengine.sunsource.net/project/gridengine/62patches.txt.
This is the latest beta release of SGE 6.2 and we really need more eyeballs and testers on this release to flesh out any remaining issues before 6.2 goes officially out the door.
There are some differences in 6.2 both in the install procedure as well as the daemons (sge_schedd is gone! -- It's now a thread within sge_qmaster). I posted a screencast recording of the SGE 6.2 Beta 1 installation a while back: http://gridengine.info/articles/2008/05/16/screencast-live-install-of-sge6-2-beta for those that may be interested in watching what the new install process looks like.
June 2008 SGE Workshops
Consider this post a plug for the upcoming June 2008 SGE User and SGE Admin workshops that are being held in the Boston, MA USA area.
More details here:
http://blog.bioteam.net/2008/03/22/sge-training/
SGE 6.2 beta binaries are available for testing
I'm not going to waste time copying the release announcement into a blog post. The full announcement can be read here:
http://gridengine.sunsource.net/servlets/ReadMsg?list=announce&msgNo=94
Lots of significant changes in the product itself. I also love the migration of manuals and docs to the new http://wikis.sun.com/display/GridEngine site.
Please remember that the reason for this beta release is to allow you to test 6.2 before it officially goes out the door in final form. The more people we have working on and stress-testing 6.2 the less chance there will be an inconvenient or unexpected upgrade issue, bug or glitch. The developers have good testbed environments and testsuites but they can't simulate all the different ways and methods that we use (and abuse!) SGE to get work done. Help make the 6.2 release a big success by testing now and providing feedback.
SGE 6.2 goes beta next week (your help needed)
SGE 6.2 is being released in Beta form next week and the developers are asking for people to make some time if possible to fully test out the beta snapshot of the latest major SGE point release.
Andy's full note can be found here (well worth reading in full ...):
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=24426
In my mind, I'm most excited about the following:
- Advance Reservations & array job inter-dependencies
- The scheduler is now a thread within the qmaster!
- The JVM running within the qmaster
- SGE moving all docs into wiki form!
RHEL5.2/Centos5 kernel update may cause problems
This is a heads up for RedHat Enterprise Linux (RHEL) users as well as for users (like myself) of the various Centos variants.
There is a recent patch for RHEL that changes the inode data structure exposed to NFS clients from 32 bits to 64 bits in size. The basic summary of this issue is that many applications may not handle this change gracefully (such as one report with the SGE linux binaries.)
RHEL and modern Centos users should probably pay attention to (by subscribing as CC: contacts) to this issue:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2543
A RedHat bug report discussing the issue in more detail is here:
"Large inode number patch breaks applications"
https://bugzilla.redhat.com/show_bug.cgi?id=241348
6.1 leak found; schedd_job_info is not your friend
Anyone interested in the memory leak that has been bothering some 6.1 users should check out the comments associated with Issue #2464:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2464
Among the interesting things you'll see are:
- A great example of motivated SGE users and developers working together to track down a hard to find problem
- Interesting comments on the potential "unfixible" (my words) nature of the schedd_job_info messages
- A really cool workaround for getting job scheduler messages with schedd_job_info=FALSE
In a nutshell, there is a problem in the schedd_job_info framework that can cause massive resource utilization on the qmaster machine. This happens in particular on larger systems or places with large numbers of queue instances. This can also pop up on systems with jobs that are pending due to un-fulfillable resource requests. This explains why I saw the memory leak on my small testbed cluster -- I have a number of "pend forever" jobs in the queue for demonstration purposes.
The fix is to disable schedd_job_info. This is potentially problematic though as that feature is pretty much my goto-first action for troubleshooting job dispatch problems.
However, in a recent update comment to this issue, andreas added a possible tip for getting scheduling messages about a job in a way that that puts far less load on the system AND does not require schedd_job_info=TRUE:
qalter -w v
Remember though that comments found in a bug report are not "gospel" so don't read this as news that schedd_job_info is forever broken or going away. Expect to see this and other issues discussed as part of the SGE Roadmap. You are attending the May 2008 SGE Workshop, right?
Release 6.1u4 is out
Congratulations to the SGE developer team!
Big news today -- 6.1u4 was just announced; hopefully addressing some persistent issues people have been having with the previous releases. The plaintext list of fixed issues can be found here:
http://gridengine.sunsource.net/project/gridengine/61patches.txt
The full announcement is here:
http://gridengine.sunsource.net/news/GE61u4-announce.html
I've been unable to keep 6.1u3 running consistently on a small test system, probably due to the same memory leak others have been reporting. There is a chance that a subtle leak still exists or at least has not been fully tracked down in 6.1u4 but multiple people are working diligently on this. Best bet is to monitor the users mailing list to see the feedback.



XML Feeds