SGE 6.1u5 update is out

Posted by chris Thu, 21 Aug 2008 21:48:10 GMT

Update release SGE 6.1u5 is out today, the announcement can be read at http://gridengine.sunsource.net/news/GE61u5-announce.html and the list of bugs fixed since the last release in the 6.1 series can be found here: http://gridengine.sunsource.net/project/gridengine/61patches.txt.

Screencast showing online upgrade to SGE 6.2

Posted by chris Thu, 21 Aug 2008 14:48:19 GMT

Lubomir Petrik has posted a screencast recording showing the SGE 6.x to SGE 6.2 upgrade process. Thanks to Andy for finding and reporting this.

T-Shirt Contest

Posted by chris Wed, 06 Aug 2008 17:16:07 GMT

Snazzy new logo!

Want a T-shirt? Be quick and email Andy. Details below.


Do you want to win a truly nice open source T-Shirt?

There are T-shirts to win in three categories:

1. Among the first 50 of you who reply *directly* to me (andy.schwierskott@sun.com) and tell us what is the single most important or interesting feature in SGE 6.2 for you, we'll draw three T-shirts.

2. Three T-Shirts goes to those persons who first report that they have upgraded their production cluster to SGE 6.2. Test-beds, eval clusters, private use doesn't count.

3. Three T-Shirts goes to those persons who will be using SGE for the first time, be it because you replace another DRM system or be it because you start using a DRM system for the first time. Requirements: it must be SGE 6.2 and it must be production use, not just a test-bed, private use or eval cluster.

We'll respect your privacy and only make your name public if you agree to it! Sun Microsystems employees may not participate.

Please feel free to populate this announcement and 'lottery' to mailing lists who take care about the SGE technology.

Regards, Andy

6.2 Officially Out

Posted by chris Tue, 05 Aug 2008 16:13:26 GMT

Grid Engine 6.2 is officially out, follow the links in the blog post below to read DanT's excellent set of articles on "why upgrade to 6.2?".

Get it here:
http://www.sun.com/software/gridware/

This also marks the official transition to having all of the Sun SGE documentation and manuals in wiki form:
http://wikis.sun.com/display/GridEngine/Grid+Engine

SGE 6.2 Coming August 5th

Posted by chris Sun, 03 Aug 2008 16:28:10 GMT

Unofficial word is that the official release of SGE 6.2 is coming on Tuesday, August 5th.

To read up on why this is news, check out Dan's excellent essays:

Why upgrade? DanT explains SGE from 5.x through 6.2 and beyond

Posted by chris Fri, 18 Jul 2008 18:58:52 GMT

Dan has posted a great overview of how Grid Engine has changed since the version 5.x days, couched in the context of answering the "Why should I upgrade SGE?" questions that often come up.

I won't even excerpt it, the full article is well worth a read:
http://blogs.sun.com/templedf/entry/why_upgrade

Feedback needed: Obsolete options and parameters considered for removal

Posted by chris Tue, 24 Jun 2008 12:22:41 GMT

Grid Engine developers posted a list today of SGE configuration parameters and client arguments that are being considered for removal from the product because they are either obsolete or they duplicate settings found elsewhere.

The developers are seeking feedback and comments on their plans - if you have any please drop a line to the users@gridengine.sunsource.net mailing list. The current roadmap calls for these methods to be marked as 'deprecated' in the SGE 6.2 release with total removal planned for a future post-6.2 release.

The message can be found here:
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25045

The full list of items being considered for removal can also be found after the jump ...

The parameters planned to obsolete are:

host_conf(5)
- processors
   obsolete, same as num_proc from the complex list

sched_conf(5)
- algorithm
   just default is allowed, no additional algorithms are planed
- params JC_FILTER
   huge performance impact plus may lead to wrong scheduling decisions

sge_conf(5)
- reprioritize
   redundant because hard bound to reprioritize_interval in sched_conf(5)
- shell_start_mode
   obsolete, value from queue_conf(5) is used
- set_token_cmd
   no known AFS support
- pag_cmd
   no known AFS support
- token_extend_time
   no known AFS support
- qmaster_params DISABLE_AUTO_RESCHEDULING
   equivalent to default reschedule_unknown=0:0:0
- qmaster_params merge ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE
   We can't imaging a use case to have these values separated
- finished_jobs
   qstat -j does not work with successful finished jobs. Code seems
   to work only with jobs going into error state.

user(5)
- delete_time
   change to internal, not changeable/visible field
   Implicit set by auto_user_delete_time

qconf(1)
- sep option
   obsolete, same as num_proc
- ks option
   obsolete, same as -kt scheduler

qmod(1)
- c option
   depreciated, use -cj or -cq
- r option
   depreciated, use -rj or -rq
- s option
   depreciated, use -sj or -sq
- us option
   depreciated, use -usj or -usq

SGE 6.2 beta 2 is out

Posted by chris Thu, 19 Jun 2008 11:04:12 GMT

6.2b2 came out yesterday:

http://gridengine.sunsource.net/news/GE62beta2-announce.html

The list of bug fixes made since SGE 6.2 Beta 1 is online at http://gridengine.sunsource.net/project/gridengine/62patches.txt.

This is the latest beta release of SGE 6.2 and we really need more eyeballs and testers on this release to flesh out any remaining issues before 6.2 goes officially out the door.

There are some differences in 6.2 both in the install procedure as well as the daemons (sge_schedd is gone! -- It's now a thread within sge_qmaster). I posted a screencast recording of the SGE 6.2 Beta 1 installation a while back: http://gridengine.info/articles/2008/05/16/screencast-live-install-of-sge6-2-beta for those that may be interested in watching what the new install process looks like.

SGE 6.2 beta 2 is out

Posted by chris Thu, 19 Jun 2008 11:04:12 GMT

6.2b2 came out yesterday:

http://gridengine.sunsource.net/news/GE62beta2-announce.html

The list of bug fixes made since SGE 6.2 Beta 1 is online at http://gridengine.sunsource.net/project/gridengine/62patches.txt.

This is the latest beta release of SGE 6.2 and we really need more eyeballs and testers on this release to flesh out any remaining issues before 6.2 goes officially out the door.

There are some differences in 6.2 both in the install procedure as well as the daemons (sge_schedd is gone! -- It's now a thread within sge_qmaster). I posted a screencast recording of the SGE 6.2 Beta 1 installation a while back: http://gridengine.info/articles/2008/05/16/screencast-live-install-of-sge6-2-beta for those that may be interested in watching what the new install process looks like.

June 2008 SGE Workshops

Posted by chris Fri, 23 May 2008 13:53:49 GMT

Consider this post a plug for the upcoming June 2008 SGE User and SGE Admin workshops that are being held in the Boston, MA USA area.

More details here:
http://blog.bioteam.net/2008/03/22/sge-training/

SGE 6.2 beta binaries are available for testing

Posted by chris Tue, 13 May 2008 14:24:12 GMT

I'm not going to waste time copying the release announcement into a blog post. The full announcement can be read here:

http://gridengine.sunsource.net/servlets/ReadMsg?list=announce&msgNo=94

Lots of significant changes in the product itself. I also love the migration of manuals and docs to the new http://wikis.sun.com/display/GridEngine site.

Please remember that the reason for this beta release is to allow you to test 6.2 before it officially goes out the door in final form. The more people we have working on and stress-testing 6.2 the less chance there will be an inconvenient or unexpected upgrade issue, bug or glitch. The developers have good testbed environments and testsuites but they can't simulate all the different ways and methods that we use (and abuse!) SGE to get work done. Help make the 6.2 release a big success by testing now and providing feedback.

SGE 6.2 goes beta next week (your help needed)

Posted by chris Mon, 05 May 2008 14:00:43 GMT

SGE 6.2 is being released in Beta form next week and the developers are asking for people to make some time if possible to fully test out the beta snapshot of the latest major SGE point release.

Andy's full note can be found here (well worth reading in full ...):
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=24426

In my mind, I'm most excited about the following:

  • Advance Reservations & array job inter-dependencies
  • The scheduler is now a thread within the qmaster!
  • The JVM running within the qmaster
  • SGE moving all docs into wiki form!

RHEL5.2/Centos5 kernel update may cause problems

Posted by chris Mon, 21 Apr 2008 16:20:25 GMT

This is a heads up for RedHat Enterprise Linux (RHEL) users as well as for users (like myself) of the various Centos variants.

There is a recent patch for RHEL that changes the inode data structure exposed to NFS clients from 32 bits to 64 bits in size. The basic summary of this issue is that many applications may not handle this change gracefully (such as one report with the SGE linux binaries.)

RHEL and modern Centos users should probably pay attention to (by subscribing as CC: contacts) to this issue:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2543

A RedHat bug report discussing the issue in more detail is here:
"Large inode number patch breaks applications"
https://bugzilla.redhat.com/show_bug.cgi?id=241348

6.1 leak found; schedd_job_info is not your friend

Posted by chris Thu, 10 Apr 2008 15:07:00 GMT

Anyone interested in the memory leak that has been bothering some 6.1 users should check out the comments associated with Issue #2464:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2464

Among the interesting things you'll see are:

  • A great example of motivated SGE users and developers working together to track down a hard to find problem
  • Interesting comments on the potential "unfixible" (my words) nature of the schedd_job_info messages
  • A really cool workaround for getting job scheduler messages with schedd_job_info=FALSE

In a nutshell, there is a problem in the schedd_job_info framework that can cause massive resource utilization on the qmaster machine. This happens in particular on larger systems or places with large numbers of queue instances. This can also pop up on systems with jobs that are pending due to un-fulfillable resource requests. This explains why I saw the memory leak on my small testbed cluster -- I have a number of "pend forever" jobs in the queue for demonstration purposes.

The fix is to disable schedd_job_info. This is potentially problematic though as that feature is pretty much my goto-first action for troubleshooting job dispatch problems.

However, in a recent update comment to this issue, andreas added a possible tip for getting scheduling messages about a job in a way that that puts far less load on the system AND does not require schedd_job_info=TRUE:

qalter -w v  

Remember though that comments found in a bug report are not "gospel" so don't read this as news that schedd_job_info is forever broken or going away. Expect to see this and other issues discussed as part of the SGE Roadmap. You are attending the May 2008 SGE Workshop, right?

Release 6.1u4 is out

Posted by chris Fri, 04 Apr 2008 14:06:11 GMT

Congratulations to the SGE developer team!

Big news today -- 6.1u4 was just announced; hopefully addressing some persistent issues people have been having with the previous releases. The plaintext list of fixed issues can be found here:
http://gridengine.sunsource.net/project/gridengine/61patches.txt

The full announcement is here:
http://gridengine.sunsource.net/news/GE61u4-announce.html

I've been unable to keep 6.1u3 running consistently on a small test system, probably due to the same memory leak others have been reporting. There is a chance that a subtle leak still exists or at least has not been fully tracked down in 6.1u4 but multiple people are working diligently on this. Best bet is to monitor the users mailing list to see the feedback.

Older posts: 1 2 3 ... 6