If you are reading this post, you should also be familiar with the official Grid Engine 6 Installation Guide.
In the official install guide, there is a section called ”Before you install the Software”. It gives a nice tabular view of the decisions you’ll have to make during the install and explains each option briefly.
Disclaimer: This is shaping up to be one of those “Chris is injecting lots of his personal opinions into what should be a straightforward technical document…” posts. This is not an official document, it’s just some thing you found on the internet written by some guy you probably don’t even know, OK? Drop me a line to correct any mistakes I’ve made or to make me aware of something that I’ve totally missed.
Things you need to care about
hostnames and DNS
Grid Engine is pretty sensitive to hostnames and name resolution issues. Grid Engine likes DNS and it likes to do both forward (query by hostname) and reverse (query by IP address) DNS queries. Bad things will happen if the reverse lookup does not sync with the forward lookup. Even worse things happen when your /etc/hosts file(s) contain mistakes or typos.
Tip for Apple people
Just because Mac OS X lets you put spaces and funky capitalization into your hostname does not mean that this is a good thing to do. Your qmaster machine does not need to be called ”j0ez fuNky Xserve”. Feel free to do whatever you want with the computer name as it applies to “bonjour” network sharing, but keep the core system hostname something reasonable. Grid Engine and other Unix-ish bits under the hood of your OS X system will thank you for it. Actually, now that I’m on this topic, use the same conservative naming approach for XRaid storage arrays and local disk partitions.A good test after unpacking the Grid Engine distribution files (but before beginning the installation process) is to run some of the utility binaries to see what grid engine will “see” concerning your local environment. They can be found in the “utilbin/” directory. In particular you want to run the utilities called ”gethostname” and ”gethostbyaddr” and make sure that they are reporting good information.
Here is an example run on a test machine, The Linux “hostname” command is run, then the SGE utilities gethostname and gethostbyaddr are run to confirm that everything is consistent all the way through:
[root@dcore-amd sge-6s2u1]# hostname dcore-amd.sonsorol.net [root@dcore-amd sge-6s2u1]# /opt/sge-6s2u1/utilbin/lx26-amd64/gethostname Hostname: dcore-amd.sonsorol.net Aliases: dcore-amd Host Address(es): 66.92.70.152 [root@dcore-amd sge-6s2u1]# /opt/sge-6s2u1/utilbin/lx26-amd64/gethostbyaddr 66.92.70.152 Hostname: dcore-amd.sonsorol.net Aliases: dcore-amd Host Address(es): 66.92.70.152 [root@dcore-amd sge-6s2u1]#It is not required, but certainly easier, if your qmaster machine or cluster “portal” has a valid DNS entry. Your IT organization will know how to do this. Make sure they give you a static IP address as well!
Consistent username, UID and GID values
Regardless of how you plan to do user authentication (LDAP, NIS, NetInfo or local files) the key requirement is consistency. Make sure that all your users exist on all the nodes and that each user has unique and consistent UID/GID values.
Shared filesystem options
If you plan to install into a shared NFS filesystem, make sure the server is not mounting the filesystem with options that block the root user or remap the root UID value to a non-privledged value. Grid Engine can run as a non-root user but it needs to be started by root. There are also setuid binaries in the distribution that will break if root-squashing is enabled. Most people run shared NFS cluster filesystems over a private network subnet or VLAN, making issues of NFS security less of a concern.
Classic Spooling vs. Berkeley-DB Spooling
This may deserve a post/article all by itself. The hard thing about this decision is that the spooling method is one of the few Grid Engine things that you CANT change without doing a complete reinstallation. The official documentation makes it clear that berkeley based spooling gives better “performance” but it does not explain in enough detail the downsides. It also does not make it clear that many people (especially those with small clusters) generally will not notice a difference between the two spooling methods.
The argument for using berkeley spooling is pretty clear – it’s what the developers are concentrating on for future development and it is faster than classic mode. The downside to berkley spooling is somewhat understated – When you choose berkeley spooling you are also giving up a key fault tolerance feature of Grid Engine. Currently, berkeley spooling limits you to choosing to store your spooldir on a local non-NFS filesystem, OR on a remote spooling server. Sadly, only one remote RPC spooling server is allowed so the RPC host becomes a potential single point of failure. The RPC argument is a bit of a stretch though, as eventually this will be fixed and one of the reasons for choosing berkeley databases in the first place is so the Grid Engine developers could leverage the berkeley community and codebase for things like database failover and remote replication. Not reinventing the wheel is a good thing. The real hassle with berkely spooling in my mind is losing the wonderful plaintext ASCII configuration and state files that can be so easily read, backed up, understood and even (in emergencies!) directly edited by hand.
The simple truth for me is that the benefit of having “faster” spooling is not worth having all my critical state and configuration data stored in binary form. Your requirements could be completely different. For instance, if you foresee having to run “qsub” to submit 150 jobs per second to the grid, then you probably want berkeley spooling as this interesting mailing list thread points out.
My recommendation is this – If you are just starting out with Grid Engine, use classic spooling. If your cluster is less than 20 nodes in size, use classic spooling. Once you have the system up and running for a while you’ll easily be able to tell if your standard sorts of workload and workflows are being affected by spool performance. By that time, you’ll be comfortable enough with Grid Engine that you’ll have no trouble backing up your configuration and reinstalling with berkeley spooling enabled.
Things you don’t really need to stress out about
Grid Engine ‘Cells’
Don’t worry about cells at install time. Don’t worry about cells ever. All cells allow one to do is ”run multiple grid engine instances off of the same set of binaries”. Wow. This could be a holdover from the “codeine” product days where disk was expensive and fileserver space was a rare commodity. Or possibly now when one only has access to a single high performance shared storage system. I’m not knocking the cell concept as much as I’m trying to make the point that most people will never make use of more than one cell. The people who need cells know who they are, and the rest of us can continue on using “default”.
Think of the cell simply as “the directory inside my $SGE_ROOT where the system stores all my site specific and unique stuff”. The default installation suggestion of using ”default” as the cell name works perfectly fine and it just means that the path to your site-specific startup files etc. is going to be $SGE_ROOT/default/…
Shadow hosts, scheduler tuning profile
These things are easily configured after installation. No need to stress out about them or make any set decisions. The same goes for just about every question you are asked during installation. The only thing you really can’t change after install is the spooling method. If you don’t know (or don’t care) about a particular install-time issue, just accept what the installation script offers as a default. It can’t hurt and you can easily change it later.
Things I wish someone had told me about
The automatic install scripts are not worth dealing with on small clusters
The problem with the automatic install scripts is that the template file must be 100% correct (I never get it correct the first half-dozen attempts) and that when things go wrong, they go wrong almost silently. There is no good debug output except for the messages that may or may not get logged to /tmp on the compute node. Your best bet for dealing with automated install script issues is to edit the inst_sge script to change the first line from ”#!/bin/sh” to ”#!/bin/sh -x” so that it runs with verbose debug output. The next best thing is to ask someone who has a working template file to share. You can then edit this template to match your specific needs and it may actually work the first or third time you try to run it.
For clusters smaller than 30 nodes in size (where I already have passwordless SSH access set up) it is actually quicker for me to manually log into each node and invoke the “./install_execd” script by hand. For larger systems, or systems where I want to have an automated cluster setup/teardown process I have a cache of “known good” template files that I can modify to suit the local setup.
The host_aliases file
When Grid Engine first starts up, the qmaster node writes what it believes to be its hostname to a file located in $SGE_ROOT/(cell)/common/act_qmaster. In many cases, the hostname that gets written to the act_qmaster file is the fully qualified public hostname of your cluster master node. In many of the same cases, this public hostname MAY NOT be the same as what your compute nodes use to speak with the same machine.
Imagine the following scenario: A cluster “portal” node with 2 network cards. One card is connected to the internet or an institutional network so that people can actually connect to the system. A DNS lookup on the machine’s hostname will return this “public” IP address. The other network card has a different IP address and is connected to a private cluster network where all the compute nodes can be found. The problem occurs whenever a sge_execd daemon starts up on a compute node. The first thing sge_execd will do is read the act_qmaster file in order to learn which host it needs to connect up and register with. However, this presents a problem because ”act_qmaster” contains the PUBLIC hostname of the qmaster machine. The sge_execd daemon is going to try to do a DNS lookup on the public hostname and will then try start connecting to the IP address associated with the public hostname
In simple terms, the SGE host_aliases file allows you to remap the hostname or IP that your compute nodes are going to try to connect to when trying to join the cluster or speak to the qmaster process. This is particularly useful on machines with more than one IP address and hostname.
The sge_aliases file
I use this far less frequently than host_aliases but it still comes in handy. Simply put, this file allows path aliasing.
One example of where this comes in handy is on Apple Mac OS X clusters that use externally attached RAID storage arrays. From a user’s perspective both within the command-line environment and the GUI, their home directory is something like ”/Users/username”, but when Grid Engine checks this path it sees something along the lines of “/Volumes/XRAID/Users/username” and this path mismatch can cause problems when trying to run jobs. The sge_aliases file makes it easy to tell grid engine that the path ”/Users/*” is functionally the same thing as the path ”/Volumes/XRAID/Users/*”.