SGE 6.2 Coming August 5th

Posted by chris Sun, 03 Aug 2008 16:28:10 GMT

Unofficial word is that the official release of SGE 6.2 is coming on Tuesday, August 5th.

To read up on why this is news, check out Dan's excellent essays:

Dan reviews Grid Engine queues

Posted by chris Wed, 28 May 2008 15:33:34 GMT

Dan is back and blogging up a storm, he recently posted a nice overview of Grid Engine queue basics. An excerpt is here:

... So, aside from governing the number of free slots on a host, what does a queue do? It controls the execution context of jobs that run in it. It determines what parallel environments are available, what file, memory, and CPU time limits should be applied, how the job should be started, stopped, suspended, and resumed, what the job's process' nice value is, etc. ...

Read the full article here:
http://blogs.sun.com/templedf/entry/intro_to_grid_engine_queues

SGE Quick Reference

Posted by chris Wed, 06 Feb 2008 16:26:16 GMT

...in which Chris continues his habit of making incremental modifications to the work of people far smarter than him ...

There are two things I really like about Platform LSF product documentation that Grid Engine currently does not offer and I mentioned both at the 2007 SGE Workshop in Regensburg.

The first really cool thing that LSF does is create a custom HTML-formatted file after installation that combines a basic "how to administer your cluster" page with commands, links and paths that are specific to how the installation was performed on your local system. The path information is captured during install time and inserted into the custom "about your cluster" HTML page as one of the final installation steps. Very easy to do and incredibly useful. This is something that I may want to take a stab at contributing to SGE myself because the installation shell scripts are easier for me to get my head around than the C++ codebase that is Grid Engine itself. If you want to see an example for yourself, try a Google Search for "LSF + about_cluster" and see if anything comes up.

The second thing I really like about LSF's initial documentation are the excellent tri-fold brochure style "Quick Reference" PDFs that they provide with each update to their product. It is one of the better end user and administrator cheat sheets I've ever seen. I'd post a link to an example but I'm pretty sure that the PDF is only supposed to ship with the product. You can, however, do a Google Search for "LSF Quick Reference" and see what I'm talking about.

Because I do Grid Engine support for my employer I was able to justify some time spent hacking on DanT's "SGE Cheat Sheet" as well as a bit of borrowing from his Georgetown training slides in order make a first-draft version of a SGE specific tri-fold formatted PDF. Because I did this work on company time it's posted over at the corporate blog site.

Comments and feedback appreciated. Looking at the document I think it would be a good approach to split this into 2 documents -- one aimed at getting SGE Admins up to speed quickly and a second document crafted specifically for end-users who just want to use a SGE system. I may take a stab at a user specific version of this doc over the next few weeks.

Click on the graphic above to get to the PDF download link, or point your browser here:
http://blog.bioteam.net/2008/02/06/grid-engine-quick-reference-guide/

4000 nodes and 62,976 cores in one Grid Engine cluster

Posted by Rayson Wed, 23 Jan 2008 05:31:00 GMT

The Texas Advanced Computing Center (TACC) will be bringing the Ranger Supercomputer online soon. And, SGE will be the batch system for the cluster.

As Ranger has close to 4000 nodes and 62,976 cores (each Sun Blade X6420 node has 4 quad-core Opteron processors), SGE 6.2 adds a number of scalability features to support this huge cluster:

  • running the scheduler as a thread of the qmaster
  • reducing the load report overhead
  • At the SC07 conference, DanT talked about TACC, the scalability improvements and other features in SGE 6.2.

    Video introduction to Grid Engine

    Posted by chris Fri, 30 Mar 2007 02:16:00 GMT

    Daniel has posted the following “Introduction to Grid Engine” video to YouTube and is threatening to put up similar content in the future. It takes a non-trivial amount of effort to create and freely distribute things like this so if you find this type of content useful, head over here (or here) and tell Daniel so.

    Description:

    “Introduction to grid computing and the Grid Engine product with Fritz Ferstl and Miha Ahronovitz. Includes demo.”

    Notes About Grid Engine on Windows

    Posted by DanT Tue, 06 Feb 2007 19:23:00 GMT

    A little while ago I emailed Harald, the lead engineer on the Grid Engine Windows port, to ask for the full background story on Grid Engine, Interix, and Windows. What follows is an English translation of Harald’s response, which I thought was worth sharing with the community:

    In general Windows offers the same or similar functionality to UNIX. Even so, the system calls have different names, and there is often no exact equivalent for many system calls. For this reason, one can’t simply use #define or a simple adapter library to translate the UNIX function calls into Windows function calls.

    To get around this problem, we use Interix for the Windows port. Interix is a POSIX subsystem and part of the "Microsoft Services for Unix" (SFU).

    Windows consists of the actual Windows kernel and the various subsystems that enable the applications to talk to the kernel in a specific way. In Windows, the WIN32 subsystem is used most. Occasionally the WIN16 subsystem also turns up. WIN16 allows old Windows 3.11 applications to run on Windows NT/2000/XP.

    Interix makes it possible for UNIX applications to run on Windows with relatively minimal adjustments. All of the standard Interix libraries were taken over from HP-UX – to the extent possible. Interix has therefore also inherited some peculiarities from HP-UX.

    Interix cannot be 100% UNIX. For things that are managed by the Windows kernel and not by the subsystem, Interix can translate, but it can’t change the behavior. For example, Interix cannot translate the Windows superuser "Administrator" into the UNIX superuser "root" because that user is managed by the kernel.

    The user UID and GID and "security identifiers" (SID) in Windows of the form "S-1-5-21-1844237615-1606980848-1060284298-500". These SIDs are translated by Interix into 6-digit numbers for local users and 7-digit numbers for Windows Domain Users. The user ids therefore cannot be specified, but are automatically generated by Windows, and in turn automatically translated into UIDs by Interix.

    Network access – only fully authenticated users have network access rights, so the user password must be registered. One does that for "rlogin" with "regpwd". Our execution daemon has its own "sgepasswd" for that purpose. Regardless, it is not possible with Interix to use the setuid-root functionality (when a binary has the permissions, -r-s——) over a network filesystem. That functionality only works over local filesystems. "sgepasswd" is a setuid-root binary. When SGE is installed on a network filesystem, as is common practice, sgepasswd cannot be used from Interix. It can only be used from a UNIX machine.

    In addition, Interix does not offer an API that one can use to talk directly to the kernel. WIN32, on the other hand, has such an API. For this reason, one must port everything that needs direct access to the kernel to WIN32. For us, that includes the load sensor "qloadsensor.exe" and the "N1 Grid Engine Helper Service" (whose executable is called SGE_Helper_Service.exe).

    The helper service makes it possible to start Windows jobs that have to display their GUIs on a visible desktop to function. Actually, Windows applications don’t necessarily have to work that way. Windows also offers virtual desktops that can be run in the background. The problem is that many Windows applications display their error messages only in dialog windows, and the only way to determine, for example, that an environment variable is missing, is to be able to read the messages displayed in the dialog window.

    One further limitation on Interix is the group permissions for Windows users. This has the greatest effect on the administrator. In Windows, one can have as many administrators as one pleases. Every user in the "Administrators" group is an administrator. Because Interix is a POSIX subsystem, and POSIX only allows exactly one superuser, Interix only allows the user "Administrator" as superuser, independent from group membership. (Don’t ask me how that can be done internally when Interix isn’t supposed to be able to have any influence over the users.)

    In Windows one normally uses a domain to manage the computer. The users are managed from the domain and authenticated against the domain. There are very few special local users on the individual machines. In Interix these users can be explicitly referenced by their "fully qualified names," in the form <domain>+<user>. <domain> is included whether it’s the Window domain or the name of the local machine, as every machine has its own small, local domain. There is a default domain or principal domain that can be requested with "pdomain". If one enters the short form of the user name (only <user>), Interix automatically prepends the principal domain. The principal domain normally corresponds to the Windows domain.

    So that one can access NFS filesystems, an NFS client with User Name Mapping is included in SFU. The User Name Mapping creates an association between the Windows user and the UNIX user on the NFS server. More specifically, it makes sure that a Windows user who has an SID that Interix would translate, for example, into "1001" can access data that the UNIX user with the UID, "1001", owns.

    This NFS client is, however, not part of Interix and must not be used. One could, for example, use Samba, except that we’ve never tested it, but we’ve heard of customers trying it.

    History of Grid Engine

    Posted by chris Thu, 22 Jun 2006 17:49:00 GMT

    A fascinating thread on the sge-users mailing list today concerning the past history of Grid Engine (and it’s relation to CODINE, DQS and Raytheon Systems). Well worth a read.

    Highlights including the posting of this 10 year old ‘queuing system ancestry’ image (original source not stated):

    Fritz also contributes a nice history timeline for the grid engine codebase and project:

    
    1992: Genias acquires rights to commercialize DQS from FSU.
    1993: Genias productizes and extends DQS and releases it as CODINE.
    1993-1995: Further evolution of CODINE until version 3.3.
    1995-1997:
             - Rewrite of v3.3 into v4.0
             - Addition of "GRD" policy module in response to DoD Mod
               project; Raytheon was primary contractor; Genias and
               Instrumental were subcontractors
                 --> Raytheon paid for that development at the time
                 --> Raytheon was an important co-developer of the module;
                     Genias was the main contributor, though.
    1997-2000:
             - GRD (=Codine+GRD-Module) and CODINE are co-marketed as
               separate products; Raytheon sells GRD to gov customers; Genias
               sells both tools to commercial accounts.
             - Genias and Raytheon continue to co-develop the GRD module,
               with Raytheon contributing an important but comparatively
               smaller part.
    2000: Genias merges with Chord into Gridware (no change regarding
             relationship w/ Raytheon)
    2000: Sun acquires Gridware; as part of acquisition, Sun receives all
             rights to GRD in exchange for a compensation to Raytheon.
             Raytheon retains right to sell the now renamed Grid Engine into
             own accounts.
    since 2000: Raytheon continues to sell and support Grid Engine and to
             contribute to Grid Engine developments, although on a more
             reduced scale.
    
    

    choosing: SGE vs LSF vs Torque

    Posted by chris Sun, 12 Feb 2006 18:22:56 GMT

    In this thread, Mark Olesen provides a bit of detail explaining why his group chose to deploy Grid Engine. His comments about the true depths of what "turnkey" vendors can provide is spot on and should be kept in mind by anyone researching or considering deploying a distributed resource management software layer:

    ... Unfortunately, nobody could offer us a complete turn-key solution. They could install the system, set up queues in accordance with our specifications, and include any job submissions scripts that we would provide them. We were most certainly left with the impression that we would essentially need to specify how 90% of everything should be implemented, and they would implement it for us.

    We thus took exactly the opposite approach and decided to try and learn the remaining 10% ourselves and GridEngine appeared to be the best option. In case it didn't pan out with GridEngine, we figured that we could always invest in a commercial solution or get commercial support from Sun. In either case, we'd have gained a good idea of job submission scripts and how queuing should or should not work.

    As you may guess, we haven't found a reason to move away from GridEngine. With the version 6, any doubts that may have remained have been removed.

    This is valuable advice, a quick trawl through the SGE users mailing list will show a vast array of different usage, configuration and deployment requirements. Even in my day job, where I've spent a lot of time deploying SGE for use in particular industries I still see SGE used in many different ways.

    As a general rule, people looking to get the most out of Grid Engine (or any other similar product) should plan on developing and maintaining at least a small amount of in-house expertise. How else can you ensure that your "turnkey" vendor did a suitable job?

    Meanwhile...

    Over on the bioclusters mailing list, Bonnie started a similar thread about choosing distributed resource management software. Tim Cutts mentions a post I had made on "SGE and LSF and which is Best" -- the post he referrs to is here:

    http://bioinformatics.org/pipermail/bioclusters/2005-August/002671.html

    I still think that summary of "SGE vs LSF" is correct. In 2006 everyone has the core functions down now so the main comparative differences have to do with cost, support and the various sets of layered features and add-ons offered. The one addendum I should add is that I think in all of 2005 I never found a need or requirement to swap out SGE on a project in favor of Platform LSF.

    This will change in 2006 as I'm working with at least one very large client who will likely be best suited by going with Platform LSF. I'm looking forward to this actually, it will be a nice change and a good way to re-polish my LSF knowledge.

    I'm also looking forward to finding the time to re-evaluate PBS Pro, its' been a long time since I've been hands-on with that offering.


    Grid Engine 6.0u7_1 released

    Posted by chris Mon, 23 Jan 2006 16:11:05 GMT

    The announcement is here:

    http://gridengine.sunsource.net/project/gridengine/news/SGE60u7_1-announce.html

    The list of patches and issues resolved is fairly large, it can be read here:


    http://gridengine.sunsource.net/project/gridengine/60patches.txt


    I'll do the usual thing of marking up the patch list with HTML links that point to the actual issue reports sometime on Tuesday.

    Sun N1 Grid Engine is now free

    Posted by chris Thu, 01 Dec 2005 16:34:03 GMT

    Rayson was the first person on the SGE users list to notice this big announcement from Sun Microsystems.

    The key bits from the release:

    Included at no cost in the new Solaris Enterprise System are:
    • The award winning and open sourced Solaris 10 OS, with the recently announced PostgreSQL database;
    • The entire Sun Java Enterprise System infrastructure software platform, including the Sun Java Identity Management Suite, Sun Java Integration Suite, Sun Java Communications Suite, Sun Java Application Platform Suite, Sun Java Availability Suite and Sun Java Web Infrastructure Suite;
    • The N1 Management Software including the Sun N1 System Manager, the Sun N1 Service Provisioning System, the Sun N1 Grid Engine;
    • All tools for C, C++ and Java development, including Sun Studio 11, Sun Java Studio Enterprise 8 and Sun Java Studio Creator;
    • SunRay ultra-thin client software;
    • Sun Secure Global Desktop Software.

    What does this mean for you?

    The biggest change is that now people will have access to Grid Engine components that were previously only available to paying Sun N1 Grid customers. The biggest items include:

    • The ARCO job and resource usage monitoring and reporting subsystem
    • Windows Grid Engine exec host client code

    If I had to summarize industry coverage it would be something like this from the Reuters article:

    "...Although more of Sun's software will now be free, customers will have to pay for service and support, which is how Sun aims to boost revenue."

    Other coverage:

    Pretty pictures explain Functional vs Sharetree scheduling 1

    Posted by chris Fri, 30 Sep 2005 21:21:00 GMT

    I saw versions of these images in Charu’s presentation slide deck a long time ago. They did a good job visually explaining the scheduling behavior differences in Grid Engine Sharetree vs Functional share policies. Now that they appear in a publicly accessible PDF file1 I can shamelessly excerpt them:

    1Source: http://www.sun.com/products-n-solutions/edu/whitepapers/pdf/web_services_for_HPC.pdf

    Click the “Read more” link for more information and bigger versions of the images …

    Sharetree behavior

    The key bit of information here is to note how the entitlement shares allowed to Project B actually dip BELOW the 50% threshold in the later stages of the time series. This is because the SGE Scheduler “remembers” past usage (see earlier in the graphic where Project B is using WAY MORE than 50% of available cluster resources) and is compensating Project A for the previous excess usage of Project B. Over time, as the graph shows, the SGE Scheduler works to bring harmony to the assigned 50-50 split of cluster resources between two projects.

    Functional policy behavior

    The key bit of information here concerning the functional share policy is that there is no “memory” of past usage by Project B. Early on in the time series, Project B is allowed to take advantage of “extra” available idle resources. As soon as Project A starts wanting to do work again, the Grid Engine scheduler starts enforcing the 50-50 entitlement split. Project A never gets “compensated” for letting Project B use more than its allocated share because the Grid Engine scheduler does not consider past usage within the Functional policy.

    Summary

    The Sharetree Policy “remembers” past usage and works to enforce the configured resource allocation entitlements as averaged over time. This may include compensating some users/groups/projects temporarily with “extra” entitlements to make up for times when other users/groups/projects were using more than their configured entitlements.

    The Functional Policy will also allow “extra” entitlements if cluster resources are idle or otherwise available. It will not, however, penalize or compensate anyone for prior usage. When things are busy, the scheduler will attempt to enforce it’s allocation policies exactly as they have been configured.

    Related article

    I wrote a mini-Howto showing how to do percentage based resource allocation between different Department groups on a Grid Engine cluster. You can find it online at http://bioteam.net/dag/sge6-funct-share-dept.html. There is some additional information there about the different scheduling polices that may or may not be of some use.