Tag: parallel

Resource reservation prevents parallel job starvation

Posted by – May 31, 2006

In a recent mailing list post, Rui Ramos describes a commonly encountered resource allocation problem:

… I’m making some tests and if i have queue that’s full and have this list of jobs waiting

jobA 4 slots

jobB 1 slot

jobB.1 1 slot

jobB.2 1 slot


Let’s say that the jobs of type B are very quick and a user submits 2000 of them. On the other hand, we have a job that requires 4 slots. But each time we have a free slot it starts a job of type B. following this the jobA only executes when all jobB are finished. Unless the GridEngine can make some kind of slot reservation for jobs with higher priority ? Is this native in the N1GE scheduler, do we need to set it up ?

For people with clusters that run a mix of serial and parallel job, this can be a common problem. The serial jobs zip in and out of the execution slots fast enough that there are never enough free slots at any given scheduling interval to satisfy the demands of pending parallel jobs that need multiple slots in order to execute.

The end result is that the larger parallel jobs languish or “starve” in the pending list for very long periods of time.

The mailing list thread contains some useful replies:

Reuti provides a solution:

what you need is “resource reservation”. Just turn on the reservation in the scheduler “qconf -msconf” by setting “max_reservation 20” or an appropriate value and submit the parallel job with “-R y”.

… and Andreas provides a link to the resource reservation specification document that provides more information about Rui’s problem under the heading of “large parallel job starvation problem”:

   ... Resource reservation can be used to guarantee resources are dedicated 
   to jobs in jobs priority order. A good example which helps to comprehend 
   the problem solved with resource reservaiton/backfilling is the so-called 
   "large parallel job starvation problem". In this scenario there is one 
   high priority pending job (possibly parallel) A that requires a larger quota 
   of a particular resource and a stream of smaller and lower priority jobs B(i) 
   requiring a smaller quota of the same resource.
 
   Without resource reservation an assignment for A can not be guaranteed
   assumed the stream of B(i) jobs does not stop - even if job A actually
   has higher priority than the B(i) jobs:

        A      
        |                     
    +---+----+--------+--------+--------+--------+--------+   +----------+
    |  B(0)  | B(2)   | B(4)   | B(6)   | B(8)   | B(10)  |   |          |
    +---+----+---+----+---+----+---+----+---+----+---+----+---+    A     |
        | B(1)   | B(3)   | B(5)   | B(7)   | B(9)   | B(11)  |          |
        +--------+--------+--------+--------+--------+--------+----------+-->
        
    
   With resource reservation job A gets a reservation that blocks lower 
   priority B(i) jobs and thus guarantees resources will be available for
   A as soon as possible:

        A
        |                     
    +---+----+----------+--------+
    |  B(0)  |          |  B(2)  |   ...
    +---+----+    A     +--------+--------+
        |    |          |  B(1)  |  B(3)  |  ...
        +----+----------+--------+--------+------------------------------->

Creating parallel-only grid engine queues

Posted by – November 4, 2005

Learning something new every day …

Someone on the users mailing list recently asked how he could create a queue configuration that would only accept parallel jobs. The answer itself is pretty simple but the implementation methods depend on the major version of Grid Engine one is using.

The Grid Engine 5.x way

The types of jobs that a queue will accept in 5.x depends entirely on the value of the qtype parameter. Simply modify the queue configuration and adjust the value of qtype. The default is usually set at “BATCH INTERACTIVE PARALLEL” so to make a parallel only queue simply set qtype to “PARALLEL“.

The Grid Engine 6.x way

A new parallel environment related parameter was introduced to the queue configuration in Grid Engine 6.x. In this configuration, the parameter “pe_list” explicitly lists all configured parallel environments that the queue will accept jobs for. To allow parallel jobs requesting a certain PE to run in the queue, just add the PE name to the pe_list parameter. To block the queue from running any other type of job, edit the qtype parameter to remove the default “BATCH INTERACTIVE” settings and replace them with a value of “NONE“.