No IO usage measurements on Linux

Posted by chris Wed, 28 Nov 2007 17:41:36 GMT

Ever wonder why IO usage for Grid Engine jobs running on Linux systems are not captured in either the SGE accounting or reporting logs?

This message posted to the Users mailing list kicked off an interesting thread and even generated a new Enhancement Issue and submitted patch.

It turns out that IO usage is always reported as "0.00000" under Linux because the built in PDC code within Grid Engine does not have an easy way (under Linux) to learn about IO consumption on a per-task or per-process basis.

Some additional digging by the original poster revealed some interesting Linux kernel options:

The Linux kernel can be compiled with CONFIG_TASKSTATS and CONFIG_TASK_IO_ACCOUNTING options which enable simple per-process I/O usage tobe counted through /proc/(PID)/io as well as the taskstats interface. The execd's PDC module is not aware of these interfaces, and therefore makes no attempt to count this usage under Linux.

In Issue 2429 a patch is submitted that lets the SGE PDC code be aware of io reporting values that can be found in /proc/(PID)/io.

How you can help:

  • For your particular flavor of Linux, determine if the kernel options "CONFIG_TASKSTATS" and "CONFIG_TASK_IO_ACCOUNTING" are enabled in the default vendor supplied kernel. Add this data as a comment on Issue 2429.
  • Test out the patch yourself

No IO usage measurements on Linux

Posted by chris Wed, 28 Nov 2007 17:41:36 GMT

Ever wonder why IO usage for Grid Engine jobs running on Linux systems are not captured in either the SGE accounting or reporting logs?

This message posted to the Users mailing list kicked off an interesting thread and even generated a new Enhancement Issue and submitted patch.

It turns out that IO usage is always reported as "0.00000" under Linux because the built in PDC code within Grid Engine does not have an easy way (under Linux) to learn about IO consumption on a per-task or per-process basis.

Some additional digging by the original poster revealed some interesting Linux kernel options:

The Linux kernel can be compiled with CONFIG_TASKSTATS and CONFIG_TASK_IO_ACCOUNTING options which enable simple per-process I/O usage tobe counted through /proc/(PID)/io as well as the taskstats interface. The execd's PDC module is not aware of these interfaces, and therefore makes no attempt to count this usage under Linux.

In Issue 2429 a patch is submitted that lets the SGE PDC code be aware of io reporting values that can be found in /proc/(PID)/io.

How you can help:

  • For your particular flavor of Linux, determine if the kernel options "CONFIG_TASKSTATS" and "CONFIG_TASK_IO_ACCOUNTING" are enabled in the default vendor supplied kernel. Add this data as a comment on Issue 2429.
  • Test out the patch yourself

Estimating space requirements for the ARCo database

Posted by chris Fri, 02 Nov 2007 15:55:06 GMT

Another posting prompted by an old message flagged in my inbox ...

With ARCo and the dbwriter code migrating from N1 Grid Engine into the open source codebase the Grid Engine accounting and reporting console is likely going to get more attention and eyeballs from the community. Relating to this, Roland had pointed out the existence of the following page:

http://gridengine.sunsource.net/howto/arco/arco_db_size.html

... the page includes a link to a downloadable spreadsheet (Open Office format) that can be used to guide sizing decisions. Also interesting is a table listing the default retention times for various data elements stored within the database.

New howto: Sizing ARCo databases

Posted by chris Fri, 31 Aug 2007 13:32:28 GMT

Roland writes:

I've added a new howto with a spreadsheet document to calculate the estimated database space usage. The link is:

http://gridengine.sunsource.net/howto/arco/arco_db_size.html

I appreciate your Feedback, especially about discrepancies with calculated and real world values, to improve the document.

The HowTo document contains nice spreadsheet where one can plug in values and see what the estimated size requirements may be.

For those that don't have OpenOffice installed or handy, A version converted to MS Excel 97 can be found here: http://gridengine.info/files/arco_db_size_v1.1.xls

New howto: Sizing ARCo databases

Posted by chris Fri, 31 Aug 2007 13:32:28 GMT

Roland writes:

I've added a new howto with a spreadsheet document to calculate the estimated database space usage. The link is:

http://gridengine.sunsource.net/howto/arco/arco_db_size.html

I appreciate your Feedback, especially about discrepancies with calculated and real world values, to improve the document.

The HowTo document contains nice spreadsheet where one can plug in values and see what the estimated size requirements may be.

For those that don't have OpenOffice installed or handy, A version converted to MS Excel 97 can be found here: http://gridengine.info/files/arco_db_size_v1.1.xls

New user contributed accounting script

Posted by chris Wed, 16 May 2007 02:05:35 GMT

A new "pull statistics from the SGE accounting log file" script has been posted to the SGE community. Olivier Blondel took Joe Landman's "usage.pl" script and modified it to suit his own needs. The script can be found embedded inline with Olivier's post to the users mailing list.

It's official: Project Hedeby and ARCo join the SGE codebase

Posted by chris Wed, 13 Dec 2006 15:30:35 GMT

Sun has formally announced the additions promised at SC'06, the full announcement is available online here:

Of the two, ARCo is the more established layered product. This is the SQL driven accounting and reporting tool that was previously only available in the commercial version of N1 Grid Engine from Sun. ARCo uses Java to parse the SGE accounting logs for inclusion into an SQL back-end database. In addition to the metrics found in the accounting logs, ARCo has hooks for calculating useful "derived" metrics that are not explicitly stored in the accounting files.

When I first used ARCo (early on in its very first release version) one of the main weaknesses was the front end web based reporting console - for anything but the most basic reports, a user was expected to paste raw SQL queries into a web form. Sun's act of putting ARCo into the open source codebase should hopefully kickstart an idea that has been floating around for a while -- some sort of community wiki page or repository of user-generated ARCo queries and report templates. ARCo users are encouraged to send these sorts of tips and tricks to the users mailing list.

"Project Hedeby" aka the "Grid Engine Service Domain Management module" also mentioned at SC'06 is at an earlier stage in it's development. The nontechnical description is as follows:

Project Hedeby provides access to a new technology which allows to dynamically manage resources across so called Service Domains. Service Domains can be envisioned as autonomous Grids controlled by a resource manager including but not limited to Grid Engine. Hedeby will adjust the allocation of resources to individual service domains in order to meet Service Level Objectives. Reallocating a host resource to another service domain may include re-provisioning of the underlying virtual or actual operating system stack.

In his interview with GRIDtoday, Fritz provides the following description:

"... provides policy and demand-based re-allocation of arbitrary resources across service domains. Service domains are totally autonomous Grids which are controlled by a workload management facility, such as Grid Engine, but also by arbitrary other service infrastructures like application servers or web servers..."

Thanks to Andy for pointing out that the project codename, "Hedeby" refers to a Viking trade town from the 8th-11th century.

Installing ARCo on x64 Linux with Blackdown Java JVM

Posted by Rayson Fri, 08 Dec 2006 02:46:00 GMT

Todd was trying to install the Accounting and Reporting Console (ARCo) for an Opteron cluster, and got the error message:
Java setup
----------
We need at least java 1.4.1

Please enter the path to your java installation [] >> /opt/j2re1.4.2

ERROR: This java version does not support 64-bit native libraries,

       The use of libdrmaa.so from the lx24-x86 binaries would be 
       possible, but the packages are not installed.

       Please install a 64-Bit java version or the N1GE 32-bit
       binary packages for the architecture lx24-x86!
The fix is to hack the “inst_dbwriter” script to remove the “-d64” flag which is not supported by Blackdown Java.

News from SC06 - Sun frees ARCO and Windows modules for Grid Engine

Posted by chris Tue, 14 Nov 2006 18:44:27 GMT

All the cool kids are at the Supercomputing 2006 meeting this week. Among the flurry of vendor announcement and release news is the following notice from the Sun and the Grid Engine project:

In a nutshell, 2 modules that were previously only found in the commercial Sun N1 Grid Engine suite -- ARCO (reporting/analysis subsystem) and the code that allows for MS Windows systems to act as submit hosts and execution hosts, are being open sourced.

In addition there is mention of "Grid Engine Service Domain Management module" but other than a planned demo at SC06 there is not much more info available on it.

The full announcement is here:
http://gridengine.sunsource.net/news/SuperComputing2006.html

Simple perl reporting tool for SGE accounting data

Posted by chris Wed, 11 Oct 2006 12:55:19 GMT

Joe at Scalable Informatics is offering up a "quick -n- simple" reporting script for Grid Engine accounting and usage data.

Usage examples:

[landman@minicc ~]$ ./usage.pl
Total usage: (in units of second(s))
        wallclock  :       46733.000 second(s)
        user time  :        1600.000 second(s) [3.42%]
        system time:          17.000 second(s) [0.04%]
        cpu time   :       70379.000 second(s) [150.60%]

user            wallclock       user time       system time     cpu time
       memory          percent of total time
landman         46733.000       1600.000        17.000
70379.000       0.000           100.000

New SGE accounting log analysis script committed

Posted by chris Fri, 26 May 2006 12:08:00 GMT

Andreas has checked in a Ruby script that does grid engine accounting file analysis. His email announcement has the details and a basic usage summary.

The script can be obtained from CVS or via a direct download: http://gridengine.sunsource.net/files/documents/7/82/analyze.rb.gz

public SVN and a new website for xml-qstat

Posted by chris Sun, 14 May 2006 21:24:00 GMT

A side project of mine, http://xml-qstat.org has a new website and (finally!) an accessible SVN code repository for downloading the package. There are still things (such as support for IE browsers) that I’d like to add before a real 1.0 release though. Truth be told the real reason for this post was to have an initial article tagged with the phrase ’xml-qstat’. The beautiful Typo-powered publishing engine running this website can dynamically construct RSS and ATOM syndication feeds based on any article category or tag. Creating the xmlqstat tag and posting news under it results in a quick and dirty way to always have an updated xml-qstat news RSS feed without having to code such features into the xml-qstat.org website.

* *

xml-qstat is an attempt to do something useful with the XML status information that Grid Engine is now able to produce. At it’s heart, xml-qstat consists of a collection of stylesheets written in XSL. The stylesheets can be used with a XSLT transformation engine to change raw Grid Engine XML data into convenient formats such as XHTML and RSS. Once the grid data has been manipulated into XHTML we can then apply other web technologies such as CSS, DHTML and JavaScript to create fairly sophisticated web based tools for Grid Engine status reporting and monitoring. The Apache Cocoon framework supplies the XML transformation and web publishing engine.

MacOS X Desktop Widget for Grid Engine

Posted by chris Tue, 11 Apr 2006 18:18:00 GMT

Bill Van Etten has put together a sweet Apple Dashboard Widget capable of monitoring Grid Engine status.

More info is available at http://bioteam.net/sgeqstat

sgeqstat screenshot
A related SGE monitoring utility can be found at:
http://xml-qstat.bioteam.net

Idle time on MacOSX/Darwin

Posted by Rayson Thu, 22 Sep 2005 05:46:00 GMT

Beth showed a way to collect the idle time information on MacOSX:
We use ioreg to ask the kernel information about the IOHIDSystem (Input Output Human INterface Device system).. Then grab the HIDIdleTime line and divide it by 1000000000 to get it into seconds.

Here is the SED version (all one line)
echo $((`ioreg -c IOHIDSystem | sed -e ‘/HIDIdleTime/!{ d’ -e ‘t’ -e ‘}’ -e ‘s/.* = //g’ -e ‘q’` / 1000000000))

Here is the perl version:
ioreg -c IOHIDSystem | perl -ane ‘if(/Idle/) {$idle=(pop @F)/1000000000; print $idle, “\n”; last;}’

Here is the AWK version:
ioreg -c IOHIDSystem | awk ‘/HIDIdleTime/ {print $NF/1000000000; exit}’”

Link to Beth’s mail.