LSF to SGE Migration Workshop at SC08
For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled "How to migrate from LSF to Unicluster with SGE".
Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.
Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.
Click below to download the invitation:
LSF-SGE-Migration-Invite.pdf
My flight lands in Austin at noon on the 18th so I'll be present for the 2nd half of the workshop.
LSF to SGE Migration Workshop at SC08
For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled "How to migrate from LSF to Unicluster with SGE".
Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.
Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.
Click below to download the invitation:
LSF-SGE-Migration-Invite.pdf
My flight lands in Austin at noon on the 18th so I'll be present for the 2nd half of the workshop.
Fixing a berkeley db spool database
Per this thread on the users list, a recepie for rebuilding and re-verifying a Berkeley based binary SGE spool:
service sgemaster stop # on failover server service sgemaster stop # on master server cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb $SGE_ROOT/utilbin/lx24-amd64/db_verify sge $SGE_ROOT/utilbin/lx24-amd64/db_recover $SGE_ROOT/utilbin/lx24-amd64/db_dump -f sge.out sge mv sge sge.old $SGE_ROOT/utilbin/lx24-amd64/db_load -f sge.out sge $SGE_ROOT/utilbin/lx24-amd64/db_verify sge service sgemaster start # on master server service sgemaster start # on failover server
Fixing a berkeley db spool database
Per this thread on the users list, a recepie for rebuilding and re-verifying a Berkeley based binary SGE spool:
service sgemaster stop # on failover server service sgemaster stop # on master server cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb $SGE_ROOT/utilbin/lx24-amd64/db_verify sge $SGE_ROOT/utilbin/lx24-amd64/db_recover $SGE_ROOT/utilbin/lx24-amd64/db_dump -f sge.out sge mv sge sge.old $SGE_ROOT/utilbin/lx24-amd64/db_load -f sge.out sge $SGE_ROOT/utilbin/lx24-amd64/db_verify sge service sgemaster start # on master server service sgemaster start # on failover server
Grid Engine & power saving
I'd guess that most people don't follow the SGE developer list all that closely. Sometimes the developer discussions cross over into areas that all users may be interested in.
There has been an interesting discussion on various ways to give SGE the ability to either directly trigger or otherwise interact with various systems that either switch nodes down into lower power states or even completely power them down/up as needed (Project Hedeby / SDM, etc.)
Automatic methods for powering up and down portions of clusters based on workload have been used for years now but the topic seems to be getting more interest and more backing. A few years ago I saw a neat solution that some people at Cornell Medical College had done -- they used PBS/Torque and had various IPMI scripts that powered nodes on or off depending on the size of the pending job list.
The developer thread (via MarkMail) is here. The CollabNet "Forum View" is here.
Beginner Guide to SGE 6.2 Whitepaper
Dan T has a new whitepaper entitled "Beginners Guide to Sun Grid Engine 6.2 Installation & Configuration" up online. Direct link is here (registration required).
Also available by going to:
http://www.sun.com/software/gridware/support.xml
... and looking under the Whitepaper section.
Intermediate SGE Config & Admin Training Class
ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".
Dates:
21-23 October 2008
Location:
Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007
Full announcement & class overview here:
Training Announcement
Intermediate SGE Config & Admin Training Class
ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".
Dates:
21-23 October 2008
Location:
Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007
Full announcement & class overview here:
Training Announcement
Intermediate SGE Config & Admin Training Class
ARC at Georgetown University in Washington, DC has announced an upcoming training class entitled "Intermediate Sun Grid Engine Configuration and Administration".
Dates:
21-23 October 2008
Location:
Georgetown University
Harris Building Room 4200
3300 Whitehaven St, NW
Washington, DC 20007
Full announcement & class overview here:
Training Announcement
Fixing SGE email issues on Apple OS X
Are you in the following situation?
- /usr/bin/mail works perfectly from the command line
- /usr/bin/mail configured as the SGE mailer produces no email
- substituting a wrapper with extra logging also produces no logs or email
The only clue is in the spool logs:
09/10/2008 16:22:07|execd|xxx-fs01|E|mailer had timeout - killing 09/10/2008 16:22:07|execd|xxx-fs01|E|mailer exited with exit status= 1 09/10/2008 16:22:19|execd|xxx-fs01|E|mailer had timeout - killing 09/10/2008 16:22:19|execd|xxx-fs01|E|mailer exited with exit status= 1
Thanks to Valerio Luccio we have a workaround. The issue is apparently a conflict between one of the SGE supplied libraries that interferes with the mail MTA on OS X when SGE tries to invoke it. A trivial wrapper script that overrides the DYLD_LIBRARY_PATH environment variable is the fix:
#!/bin/sh export DYLD_LIBRARY_PATH=/usr/lib /usr/bin/mail -s "$2" $3
This solved a problem that had been bothering me for days, thanks Valerio - I owe you a beer if we ever end up at the same meeting or conference!
Fixing SGE email issues on Apple OS X
Are you in the following situation?
- /usr/bin/mail works perfectly from the command line
- /usr/bin/mail configured as the SGE mailer produces no email
- substituting a wrapper with extra logging also produces no logs or email
The only clue is in the spool logs:
09/10/2008 16:22:07|execd|xxx-fs01|E|mailer had timeout - killing 09/10/2008 16:22:07|execd|xxx-fs01|E|mailer exited with exit status= 1 09/10/2008 16:22:19|execd|xxx-fs01|E|mailer had timeout - killing 09/10/2008 16:22:19|execd|xxx-fs01|E|mailer exited with exit status= 1
Thanks to Valerio Luccio we have a workaround. The issue is apparently a conflict between one of the SGE supplied libraries that interferes with the mail MTA on OS X when SGE tries to invoke it. A trivial wrapper script that overrides the DYLD_LIBRARY_PATH environment variable is the fix:
#!/bin/sh export DYLD_LIBRARY_PATH=/usr/lib /usr/bin/mail -s "$2" $3
This solved a problem that had been bothering me for days, thanks Valerio - I owe you a beer if we ever end up at the same meeting or conference!
Screencast showing online upgrade to SGE 6.2
Lubomir Petrik has posted a screencast recording showing the SGE 6.x to SGE 6.2 upgrade process. Thanks to Andy for finding and reporting this.
Why upgrade? DanT explains SGE from 5.x through 6.2 and beyond
Dan has posted a great overview of how Grid Engine has changed since the version 5.x days, couched in the context of answering the "Why should I upgrade SGE?" questions that often come up.
I won't even excerpt it, the full article is well worth a read:
http://blogs.sun.com/templedf/entry/why_upgrade
SGE and MPICH2 On Windows/Linux Heterogenous Systems
Thanks to Jacek Strzelczyk for the new Wiki page entitled "Install and configure Grid Engine in heterogenic environment on Linux and Windows with MPICH2" that was posted earlier this week.
Creating Hadoop PE under Grid Engine
Dan has found a great Sun blog article by Ravi Chandra Nallan post on integrating Hadoop into SGE via the use of a parallel environment.

Image source: http://hadoop.apache.org/core/
Links:




XML Feeds