LSF to SGE Migration Workshop at SC08

Posted by chris Wed, 12 Nov 2008 21:45:39 GMT

For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled "How to migrate from LSF to Unicluster with SGE".

Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.

Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.

Click below to download the invitation:
LSF-SGE-Migration-Invite.pdf

My flight lands in Austin at noon on the 18th so I'll be present for the 2nd half of the workshop.

LSF to SGE Migration Workshop at SC08

Posted by chris Wed, 12 Nov 2008 21:45:39 GMT

For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled "How to migrate from LSF to Unicluster with SGE".

Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.

Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.

Click below to download the invitation:
LSF-SGE-Migration-Invite.pdf

My flight lands in Austin at noon on the 18th so I'll be present for the 2nd half of the workshop.

Fixing a berkeley db spool database

Posted by chris Tue, 11 Nov 2008 17:43:00 GMT

Per this thread on the users list, a recepie for rebuilding and re-verifying a Berkeley based binary SGE spool:

service sgemaster stop # on failover server service sgemaster stop # on master server cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb $SGE_ROOT/utilbin/l​x24-amd64/db_verify sge $SGE_ROOT/utilbin/l​x24-amd64/db_recover​ $SGE_ROOT/utilbin/l​x24-amd64/db_dump -f sge.out sge mv sge sge.old $SGE_ROOT/utilbin/l​x24-amd64/db_load -f sge.out sge $SGE_ROOT/utilbin/l​x24-amd64/db_verify sge service sgemaster start # on master server service sgemaster start # on failover server

Fixing a berkeley db spool database

Posted by chris Tue, 11 Nov 2008 17:43:00 GMT

Per this thread on the users list, a recepie for rebuilding and re-verifying a Berkeley based binary SGE spool:

service sgemaster stop # on failover server service sgemaster stop # on master server cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb $SGE_ROOT/utilbin/l​x24-amd64/db_verify sge $SGE_ROOT/utilbin/l​x24-amd64/db_recover​ $SGE_ROOT/utilbin/l​x24-amd64/db_dump -f sge.out sge mv sge sge.old $SGE_ROOT/utilbin/l​x24-amd64/db_load -f sge.out sge $SGE_ROOT/utilbin/l​x24-amd64/db_verify sge service sgemaster start # on master server service sgemaster start # on failover server

Grid Engine & power saving

Posted by chris Fri, 07 Nov 2008 17:02:17 GMT

I'd guess that most people don't follow the SGE developer list all that closely. Sometimes the developer discussions cross over into areas that all users may be interested in.

There has been an interesting discussion on various ways to give SGE the ability to either directly trigger or otherwise interact with various systems that either switch nodes down into lower power states or even completely power them down/up as needed (Project Hedeby / SDM, etc.)

Automatic methods for powering up and down portions of clusters based on workload have been used for years now but the topic seems to be getting more interest and more backing. A few years ago I saw a neat solution that some people at Cornell Medical College had done -- they used PBS/Torque and had various IPMI scripts that powered nodes on or off depending on the size of the pending job list.

The developer thread (via MarkMail) is here. The CollabNet "Forum View" is here.

Beginner Guide to SGE 6.2 Whitepaper

Posted by chris Mon, 29 Sep 2008 18:21:41 GMT

sge-wp-1.png

Dan T has a new whitepaper entitled "Beginners Guide to Sun Grid Engine 6.2 Installation & Configuration" up online. Direct link is here (registration required).

Also available by going to:
http://www.sun.com/software/gridware/support.xml
... and looking under the Whitepaper section.

Intermediate SGE Config & Admin Training Class

Posted by chris Wed, 24 Sep 2008 16:04:19 GMT

ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".

Dates:

21-23 October 2008

Location:

Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007

Full announcement & class overview here:
Training Announcement

Intermediate SGE Config & Admin Training Class

Posted by chris Wed, 24 Sep 2008 16:04:19 GMT

ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".

Dates:

21-23 October 2008

Location:

Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007

Full announcement & class overview here:
Training Announcement

Intermediate SGE Config & Admin Training Class

Posted by chris Wed, 24 Sep 2008 16:04:19 GMT

ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".

Dates:

21-23 October 2008

Location:

Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007

Full announcement & class overview here:
Training Announcement

Fixing SGE email issues on Apple OS X

Posted by chris Tue, 23 Sep 2008 14:57:31 GMT

Are you in the following situation?

  1. /usr/bin/mail works perfectly from the command line
  2. /usr/bin/mail configured as the SGE mailer produces no email
  3. substituting a wrapper with extra logging also produces no logs or email

The only clue is in the spool logs:

09/10/2008 16:22:07|execd|xxx-fs01|E|mailer had timeout - killing
09/10/2008 16:22:07|execd|xxx-fs01|E|mailer exited with exit status= 1
09/10/2008 16:22:19|execd|xxx-fs01|E|mailer had timeout - killing
09/10/2008 16:22:19|execd|xxx-fs01|E|mailer exited with exit status= 1

Thanks to Valerio Luccio we have a workaround. The issue is apparently a conflict between one of the SGE supplied libraries that interferes with the mail MTA on OS X when SGE tries to invoke it. A trivial wrapper script that overrides the DYLD_LIBRARY_PATH environment variable is the fix:

#!/bin/sh
export DYLD_LIBRARY_PATH=/usr/lib
/usr/bin/mail -s "$2" $3

This solved a problem that had been bothering me for days, thanks Valerio - I owe you a beer if we ever end up at the same meeting or conference!

Fixing SGE email issues on Apple OS X

Posted by chris Tue, 23 Sep 2008 14:57:31 GMT

Are you in the following situation?

  1. /usr/bin/mail works perfectly from the command line
  2. /usr/bin/mail configured as the SGE mailer produces no email
  3. substituting a wrapper with extra logging also produces no logs or email

The only clue is in the spool logs:

09/10/2008 16:22:07|execd|xxx-fs01|E|mailer had timeout - killing
09/10/2008 16:22:07|execd|xxx-fs01|E|mailer exited with exit status= 1
09/10/2008 16:22:19|execd|xxx-fs01|E|mailer had timeout - killing
09/10/2008 16:22:19|execd|xxx-fs01|E|mailer exited with exit status= 1

Thanks to Valerio Luccio we have a workaround. The issue is apparently a conflict between one of the SGE supplied libraries that interferes with the mail MTA on OS X when SGE tries to invoke it. A trivial wrapper script that overrides the DYLD_LIBRARY_PATH environment variable is the fix:

#!/bin/sh
export DYLD_LIBRARY_PATH=/usr/lib
/usr/bin/mail -s "$2" $3

This solved a problem that had been bothering me for days, thanks Valerio - I owe you a beer if we ever end up at the same meeting or conference!

Screencast showing online upgrade to SGE 6.2

Posted by chris Thu, 21 Aug 2008 14:48:19 GMT

Lubomir Petrik has posted a screencast recording showing the SGE 6.x to SGE 6.2 upgrade process. Thanks to Andy for finding and reporting this.

Why upgrade? DanT explains SGE from 5.x through 6.2 and beyond

Posted by chris Fri, 18 Jul 2008 18:58:52 GMT

Dan has posted a great overview of how Grid Engine has changed since the version 5.x days, couched in the context of answering the "Why should I upgrade SGE?" questions that often come up.

I won't even excerpt it, the full article is well worth a read:
http://blogs.sun.com/templedf/entry/why_upgrade

SGE and MPICH2 On Windows/Linux Heterogenous Systems

Posted by chris Mon, 14 Jul 2008 23:03:29 GMT

Thanks to Jacek Strzelczyk for the new Wiki page entitled "Install and configure Grid Engine in heterogenic environment on Linux and Windows with MPICH2" that was posted earlier this week.

Creating Hadoop PE under Grid Engine

Posted by chris Fri, 23 May 2008 14:13:24 GMT

Dan has found a great Sun blog article by Ravi Chandra Nallan post on integrating Hadoop into SGE via the use of a parallel environment.


Image source: http://hadoop.apache.org/core/

Links:

Older posts: 1 2 3 4