In the linked mailing list post, Thamizhannal asks:
In our SGE cluster, we have 2 nodes each of 4 CPU’s and we are using “fill up host” scheduler configuration for job submission.
In this scheduler configuration, assume one parallel job (Job1) with 2 CPU’s is running on nodeA and user submits another parallel job (Job2) of 2 CPU then SGE submit this job2 on nodeA.
Consider if the Job1 is utilizing higher memory on nodeA then job2 fails due to memory unavailability.
Is there a way to avoid this using SGE configuration?
As usual, Reuti comes through with a great answer:
… you will need to request the estimated amount of memory which the job
might need. There are two ways to do it. Make:a) h_vmem
or b) virtual_free
consumable in the complex definition (qconf -sc) and define a default
comsumption there. Then attach a feasible value to each node (qconf –
me) for the installed memory. Use the one you defined in
your qsub command by requesting it with the -l option (it’s per slot,
hence multiplied for parallel jobs unless you use special settings in
the complex definition). The difference between the two ways is, that
h_vmem will be enforced and kill the job when it needs one byte more,
while b) is more a hint for SGE for the job distribution.
More background on Grid Engine and consumable resources is available at this Wiki doc link. That page concentrates on GUI based methods but also discusses the command-line methods that Reuti shows.