Posted by – December 22, 2009
Another quick link to an informative mailing list conversation, this one dealing with how to integrate Gaussian09 and it’s Linda-based parallel operating environment into Grid Engine. A number of people posted different types of solutions but it’s worth reading the entire thread and following the links. Dealing with Linda-based applications, supporting Linda with SGE and dealing with CPU allocation on one or more nodes are all non-trivial to handle. It’s worth a read if you need to use or support Gaussian.
Discussion thread on Gaussian09 with Linda 8.2:
Posted by – December 22, 2009
Job Submission Verifiers are expected to be a huge win for Grid Engine users and administrators but the feature is new enough that there is not a lot of best practices and working code “in the wild” that the community can copy and learn from …
In this mailing list thread, however, we get an actual JSV code snippet showing how one might intercept user “-pe ” requests and seamlessly alter the parallel environment request to one that makes use of the wildcard ‘*’ selector:
In the latest SGE, you can use the JSV(1) mechanism to do arbitrary
re-writes of the qsub options. I don’t remember seeing real examples of
this posted, so one that re-writes something like `-pe openmpi’ to
`-pe openmpi-*’ to hide the fact that there are multiple PEs for nodes
with different core counts, and you normally don’t want the parallel job
scheduled across such node groups.
case "$pe" in
openmpi | fluent)
jsv_set_param pe_name "$pe-*"
jsv_correct "Job was modified"
jsv_accept "Job OK"
Posted by – December 2, 2009
Quick mailing list bit that has been in the “to-blog” queue for a long time now…
In this email list thread there is a brief discussion on how setting h_vmem can lead to MatLab application crashes. The short solution is to increase the value for “h_stack” as well.
Posted by – October 22, 2009
Quick hit from the mailing list – in this thread, a user coming from a Platform LSF environment is having trouble with an application (NCSim) that allows execution to be suspended/resumed via the control-C command.
The short answer apparently is to invoke ‘qrsh’ with the ‘-pty yes’ argument.
Posted by – October 19, 2009
Mark has updated his code for making Grid Engine aware of FlexLM license servers. Read the full announcement here:
Without a doubt this is currently the industry best practice way of dealing with SGE/FlexLM integration issues. Kudos to Mark O. for open-sourcing his work.
Posted by – April 24, 2009
Mark Olesen has updated his qlicserver package. For background on why his method is better (avoiding race conditions, etc.) than the other license integration/tracking methods, read here: http://wiki.gridengine.info/wiki/index.php/Olesen-FLEXlm-Integration
Mark’s announcement can be read here. The updated code is hosted at http://gridengine.sunsource.net/files/documents/7/199/qlicserver-2009042 1.tar.gz.
Posted by – February 6, 2009
Enrico Sirola reports that an updated python module for interacting with DRMAA-compliant distributed resource management (“DRM”) systems has been released.
The DRMAA working group website is http://www.drmaa.org/ for those looking for additional information.
Posted by – November 12, 2008
For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled “How to migrate from LSF to Unicluster with SGE“.
Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.
Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.
Click below to download the invitation:
My flight lands in Austin at noon on the 18th so I’ll be present for the 2nd half of the workshop.
Gerhard Venter asked the users list for assistance in getting Dytran to run under Grid Engine. Once his issues were resolved, Gerhard was kind enough to write up a Wiki Entry on Dytran/SGE integration.
The wiki page is here:
% jrunscript -cp $SGE_ROOT/lib/drmaa.jar -f drmaa.js
Job 2 submitted
Job 2 has ended
Job terminated abnormally
The post is here: