Easy gridengine XML handling via Perl XML::Smart

Posted by – November 11, 2005

Joe Landman from Scalable Informatics posted about his success with the Perl XML::Smart ( CPAN, readme, FAQ, tutorial) module.

Unlike many of the XML handling methods within the Perl universe, this module stands on its own without a huge and complicated chain of external dependencies.

XML::Smart can quickly and cleanly parse XML documents into perl datastructures that can efficiently traversed and sorted. This makes it a great method for simple perl scripts designed to grab bits of data or information that does not get displayed in the human-readble qstat output.

Joe’s comments:

Our 6.0u6 perl based parser fits into a single line, after we grab
the data.

	$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
	$xml    = XML::Smart->new($qstat);

(no schema/DTD needed)

then for example, iterating over all the jobs …

	foreach ($xml->{job_info}->{queue_info}->{job_list}('@') )
	  {
	   ...
	  }

Using some example code (included at the end of this article by permission) kindly provided by Joe, I was able to whip up a little “just playing” script that checks all pending jobs for hard resource requests. When a hard request is found, the script simply prints out a line that lists the Job ID, Job Name and the value of the hard resource request. The script looks like this:

#!/usr/bin/perl -w
use XML::Smart;
my ($xml,$qstat);

$qstat=`/opt/sge6s2u1/bin/lx24-amd64/qstat -xml -r -f`;
$xml    = XML::Smart->new($qstat);

foreach ($xml->{job_info}->{job_info}->{job_list}('@') )
{
    if($_->{hard_request}) {
      print "Job ID $_->{JB_job_number} ($_->{JB_name}) has a hard_request: ";
      print "$_->{hard_request}{name}=$_->{hard_request} n";
    }
}


Output looks like this:

[dag@dcore-amd ~]$ ./test.pl
Job ID 47 (impossibleJob) has a hard_request: arch=darwin 
[dag@dcore-amd ~]$ 


Additional pointers and examples from Scalable Informatics are included below …


Scalable Informatics provided the following example code and explanations.



The included code is copyright (c) 2004-2005 Scalable Informatics and licensed under GPL 2

What we use today looks just like this:

use XML::Smart;
my ($xml,$qstat);

$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
$xml	= XML::Smart->new($qstat);

foreach ($xml->{job_info}->{queue_info}->{job_list}('@') )
  {
     # stuff with each job.  All the per job attributes are now available as
     # $_->{attribute_name}.
     #
  }


Now if you want to get fancy, and sort by *any* attribute (up or down, using JB_Owner in this case, refer to the XML for what you want to sort

use XML::Smart;
my ($xml,$qstat,@jobs);

$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
$xml	= XML::Smart->new($qstat);
@jobs   = $xml->{job_info}->{queue_info}->{job_list}('@');

foreach ( sort { $a->{JB_Owner} cmp  $b->{JB_Owner} } @jobs )
  {
     # stuff with each job.  All the per job attributes are now available as
     # $_->{attribute_name}.
     #
  }


To extract execution times requires a bit more work (need to parse 2 dates, subtract one from another, then return the value in a sensible format). Code to do that looks like this:

use Date::Manip;
my ($d,$t,$olddate,$delta,$dt,$date);

# ... some place later in the code ...
($d,$t)=split(/s+/, $_->{JAT_start_time}  );
if ($d =~ /(d+)/(d+)/(d+)/)  { $date = sprintf "%.4i%.2i%.2i",$3,$1,$2; }
       if ($t =~ /(d+):(d+):(d+)/)  { $date .= sprintf "%i%i%i",$1,$2,$3; }
       $olddate = ParseDate($date );
$delta = DateCalc($olddate,$today);
      $dt = Delta_Format($delta,0,qw(%st));
       printf  "%.1f second(s)n",$dt;


The issue in part is that SGE does not define an elapsed job runtime field somewhere, you need to calculate it. Hopefully this will change.

You can easily combine this into a program that grabs all the relevant data and outputs what you need. If you are using XSLT or similar, you could use this as a parser call-back.

The XML::Smart module is the recommended way to go with Perl. It is extremely fast and very flexible while also being very easy to use. Just don’t peek too much at its internal data structures, they can be … interesting. Note also that they can get huge. So if your xml is more than a few gigabytes in size, you might need to do a little extra work.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>