Tag: memory

Adding memory requirement awareness to the scheduler

Posted by – December 1, 2009

In the linked mailing list post, Thamizhannal asks:

In our SGE cluster, we have 2 nodes each of 4 CPU’s and we are using “fill up host” scheduler configuration for job submission.

In this scheduler configuration, assume one parallel job (Job1) with 2 CPU’s is running on nodeA and user submits another parallel job (Job2) of 2 CPU then SGE submit this job2 on nodeA.

Consider if the Job1 is utilizing higher memory on nodeA then job2 fails due to memory unavailability.

Is there a way to avoid this using SGE configuration?

As usual, Reuti comes through with a great answer:

… you will need to request the estimated amount of memory which the job
might need. There are two ways to do it. Make:

a) h_vmem

or b) virtual_free

consumable in the complex definition (qconf -sc) and define a default
comsumption there. Then attach a feasible value to each node (qconf –
me ) for the installed memory. Use the one you defined in
your qsub command by requesting it with the -l option (it’s per slot,
hence multiplied for parallel jobs unless you use special settings in
the complex definition). The difference between the two ways is, that
h_vmem will be enforced and kill the job when it needs one byte more,
while b) is more a hint for SGE for the job distribution
.

More background on Grid Engine and consumable resources is available at this Wiki doc link. That page concentrates on GUI based methods but also discusses the command-line methods that Reuti shows.

Reducing scheduler memory usage with libhoard

Posted by – March 6, 2008

It’s pretty interesting subscribing to the SGE Issues mailing list. This comment on Issue 2464 came across the wire today:

… I installed libhoard.so (http://www.hoard.org/) and started sge_schedd with it (changing the sge_schedd starting line in sgemaster to "LD_PRELOAD=/opt/hoard-3.7.1/lib64/libhoard.so sge_schedd").

There seems to be some problems with malloc and threads not freeing memory (or something similar, Andreas could explain this the right way) which could be affecting sge_schedd.

Since restarting sge_schedd using hoard I didn’t have any memory problems anymore, but this just happened one day ago.

If anyone else tries this method I’d appreciate feedback and comments.