"Loose" vs. "Tight" Integration of Grid Engine Parallel Environments:
Auguest 2008 Update: This post has been popular enough that it is time to update it with some fresh information: The biggest news to report is that OpenMPI from www.open-mpi.org now ships source code that makes tight integration with Grid Engine trivial. For this reason alone, OpenMPI is now my default first choice whenever I get asked to provide an MPI environment on a Grid Engine system. All that is required is that OpenMPI be compiled with the "–with-sge" switch enabled. With that done, OpenMPI is capable of automatically determining that it is running under SGE and it will "do the right thing".
The term "loose integration" is used to describe an integration approach in which the grid engine scheduler is only responsible for finding available parallel job slots within the cluster and dispatching pending jobs at the appropriate time. If a parallel job is requesting 8 CPUs the scheduler will hold the job until 8 free slots are available within the cluster. Once the resources are available, the scheduler will dispatch the job along with a unique machine file that designates which machines and/or CPUs the parallel job is allowed to run on.
The advantage of "loose" integration is primarily with its simplicity. Because the scheduler does not have to do much more than match available job slots with requested CPUs (and decide when pending jobs get launched!) it is fairly easy to quickly add Grid Engine support for all sorts of parallel application environments including PVM, MPICH, LAM-MPI, LINDA etc.
There are several disadvantages to "loose" integration. The primary downside is that the parallel tasks are not running under the control and direction of a sge_shepherd daemon. This means that Grid Engine may not be able to accurately account for resource utilization or clean up tasks left over from a runaway job. With "loose" integration you must also trust the parallel application itself to honor the customized machine file being provided. There are no technical barriers to prevent the job from ignoring the provided machinefile and just launching parallel tasks at will on every cluster node.
"Tight Integration" approaches solve these sorts of problems by binding Grid Engine more directly into the parallel application environment. With "tight" integration, Grid Engine does far more than just kicking out a custom machine file – it also takes over the responsibility for launching and managing the parallel tasks themselves. There is far more control and monitoring of the parallel jobs.
The primary problem with "tight" integration is that it tends to be highly specific to the parallel environment being used. In some cases, it may not be enough to build tightly integrated PE’s for MPICH or LAM-MPI – you may be forced to integrate on an application-by-application basis.
As a general rule, I recommend starting first with loosely integrated parallel environments. Then, if needed, tight integration can be explored for critical applications or highly popular parallel environments.
SGE Parallel Environment Integration Resources
- Reuti’s HOWTO – Loose and tight integration of the LAM/MPI library into SGE
- Reuti’s HOWTO – Setup MPICH to get all child-processes killed on the slave nodes
- Reuti’s HOWTO – Tight Integration of the MPICH2 library into SGE
- Reuti’s HOWTO – Loose and tight integration of the PVM library into SGE
- SGE 6 Admin Guide – Configuring Parallel Environments
Self plagiarism note: I took this text from a document I wrote for a FAQ system used by my company.