No IO usage measurements on Linux
Ever wonder why IO usage for Grid Engine jobs running on Linux systems are not captured in either the SGE accounting or reporting logs?
This message posted to the Users mailing list kicked off an interesting thread and even generated a new Enhancement Issue and submitted patch.
It turns out that IO usage is always reported as "0.00000" under Linux because the built in PDC code within Grid Engine does not have an easy way (under Linux) to learn about IO consumption on a per-task or per-process basis.
Some additional digging by the original poster revealed some interesting Linux kernel options:
The Linux kernel can be compiled with CONFIG_TASKSTATS and CONFIG_TASK_IO_ACCOUNTING options which enable simple per-process I/O usage tobe counted through /proc/(PID)/io as well as the taskstats interface. The execd's PDC module is not aware of these interfaces, and therefore makes no attempt to count this usage under Linux.
In Issue 2429 a patch is submitted that lets the SGE PDC code be aware of io reporting values that can be found in /proc/(PID)/io.
How you can help:
- For your particular flavor of Linux, determine if the kernel options "CONFIG_TASKSTATS" and "CONFIG_TASK_IO_ACCOUNTING" are enabled in the default vendor supplied kernel. Add this data as a comment on Issue 2429.
- Test out the patch yourself
No IO usage measurements on Linux
Ever wonder why IO usage for Grid Engine jobs running on Linux systems are not captured in either the SGE accounting or reporting logs?
This message posted to the Users mailing list kicked off an interesting thread and even generated a new Enhancement Issue and submitted patch.
It turns out that IO usage is always reported as "0.00000" under Linux because the built in PDC code within Grid Engine does not have an easy way (under Linux) to learn about IO consumption on a per-task or per-process basis.
Some additional digging by the original poster revealed some interesting Linux kernel options:
The Linux kernel can be compiled with CONFIG_TASKSTATS and CONFIG_TASK_IO_ACCOUNTING options which enable simple per-process I/O usage tobe counted through /proc/(PID)/io as well as the taskstats interface. The execd's PDC module is not aware of these interfaces, and therefore makes no attempt to count this usage under Linux.
In Issue 2429 a patch is submitted that lets the SGE PDC code be aware of io reporting values that can be found in /proc/(PID)/io.
How you can help:
- For your particular flavor of Linux, determine if the kernel options "CONFIG_TASKSTATS" and "CONFIG_TASK_IO_ACCOUNTING" are enabled in the default vendor supplied kernel. Add this data as a comment on Issue 2429.
- Test out the patch yourself
Estimating space requirements for the ARCo database
Another posting prompted by an old message flagged in my inbox ...
With ARCo and the dbwriter code migrating from N1 Grid Engine into the open source codebase the Grid Engine accounting and reporting console is likely going to get more attention and eyeballs from the community. Relating to this, Roland had pointed out the existence of the following page:
http://gridengine.sunsource.net/howto/arco/arco_db_size.html
... the page includes a link to a downloadable spreadsheet (Open Office format) that can be used to guide sizing decisions. Also interesting is a table listing the default retention times for various data elements stored within the database.
New howto: Sizing ARCo databases
Roland writes:
I've added a new howto with a spreadsheet document to calculate the estimated database space usage. The link is:
http://gridengine.sunsource.net/howto/arco/arco_db_size.html
I appreciate your Feedback, especially about discrepancies with calculated and real world values, to improve the document.
The HowTo document contains nice spreadsheet where one can plug in values and see what the estimated size requirements may be.
For those that don't have OpenOffice installed or handy, A version converted to MS Excel 97 can be found here: http://gridengine.info/files/arco_db_size_v1.1.xls
New howto: Sizing ARCo databases
Roland writes:
I've added a new howto with a spreadsheet document to calculate the estimated database space usage. The link is:
http://gridengine.sunsource.net/howto/arco/arco_db_size.html
I appreciate your Feedback, especially about discrepancies with calculated and real world values, to improve the document.
The HowTo document contains nice spreadsheet where one can plug in values and see what the estimated size requirements may be.
For those that don't have OpenOffice installed or handy, A version converted to MS Excel 97 can be found here: http://gridengine.info/files/arco_db_size_v1.1.xls
New user contributed accounting script
A new "pull statistics from the SGE accounting log file" script has been posted to the SGE community. Olivier Blondel took Joe Landman's "usage.pl" script and modified it to suit his own needs. The script can be found embedded inline with Olivier's post to the users mailing list.
It's official: Project Hedeby and ARCo join the SGE codebase
Sun has formally announced the additions promised at SC'06, the full announcement is available online here:
Of the two, ARCo is the more established layered product. This is the SQL driven accounting and reporting tool that was previously only available in the commercial version of N1 Grid Engine from Sun. ARCo uses Java to parse the SGE accounting logs for inclusion into an SQL back-end database. In addition to the metrics found in the accounting logs, ARCo has hooks for calculating useful "derived" metrics that are not explicitly stored in the accounting files.
When I first used ARCo (early on in its very first release version) one of the main weaknesses was the front end web based reporting console - for anything but the most basic reports, a user was expected to paste raw SQL queries into a web form. Sun's act of putting ARCo into the open source codebase should hopefully kickstart an idea that has been floating around for a while -- some sort of community wiki page or repository of user-generated ARCo queries and report templates. ARCo users are encouraged to send these sorts of tips and tricks to the users mailing list.
"Project Hedeby" aka the "Grid Engine Service Domain Management module" also mentioned at SC'06 is at an earlier stage in it's development. The nontechnical description is as follows:
Project Hedeby provides access to a new technology which allows to dynamically manage resources across so called Service Domains. Service Domains can be envisioned as autonomous Grids controlled by a resource manager including but not limited to Grid Engine. Hedeby will adjust the allocation of resources to individual service domains in order to meet Service Level Objectives. Reallocating a host resource to another service domain may include re-provisioning of the underlying virtual or actual operating system stack.
In his interview with GRIDtoday, Fritz provides the following description:
"... provides policy and demand-based re-allocation of arbitrary resources across service domains. Service domains are totally autonomous Grids which are controlled by a workload management facility, such as Grid Engine, but also by arbitrary other service infrastructures like application servers or web servers..."
Thanks to Andy for pointing out that the project codename, "Hedeby" refers to a Viking trade town from the 8th-11th century.
Installing ARCo on x64 Linux with Blackdown Java JVM
Java setup
----------
We need at least java 1.4.1
Please enter the path to your java installation [] >> /opt/j2re1.4.2
ERROR: This java version does not support 64-bit native libraries,
The use of libdrmaa.so from the lx24-x86 binaries would be
possible, but the packages are not installed.
Please install a 64-Bit java version or the N1GE 32-bit
binary packages for the architecture lx24-x86!
The fix is to hack the “inst_dbwriter” script to remove the “-d64” flag which is not supported by Blackdown Java.
News from SC06 - Sun frees ARCO and Windows modules for Grid Engine
All the cool kids are at the Supercomputing 2006 meeting this week. Among the flurry of vendor announcement and release news is the following notice from the Sun and the Grid Engine project:
In a nutshell, 2 modules that were previously only found in the commercial Sun N1 Grid Engine suite -- ARCO (reporting/analysis subsystem) and the code that allows for MS Windows systems to act as submit hosts and execution hosts, are being open sourced.
In addition there is mention of "Grid Engine Service Domain Management module" but other than a planned demo at SC06 there is not much more info available on it.
The full announcement is here:
http://gridengine.sunsource.net/news/SuperComputing2006.html
Simple perl reporting tool for SGE accounting data
Joe at Scalable Informatics is offering up a "quick -n- simple" reporting script for Grid Engine accounting and usage data.
Usage examples:
[landman@minicc ~]$ ./usage.pl
Total usage: (in units of second(s))
wallclock : 46733.000 second(s)
user time : 1600.000 second(s) [3.42%]
system time: 17.000 second(s) [0.04%]
cpu time : 70379.000 second(s) [150.60%]
user wallclock user time system time cpu time
memory percent of total time
landman 46733.000 1600.000 17.000
70379.000 0.000 100.000
The script is available here http://downloads.scalableinformatics.com/downloads/gridengine/usage.pl
New SGE accounting log analysis script committed
Andreas has checked in a Ruby script that does grid engine accounting file analysis. His email announcement has the details and a basic usage summary.
The script can be obtained from CVS or via a direct download: http://gridengine.sunsource.net/files/documents/7/82/analyze.rb.gz
public SVN and a new website for xml-qstat
A side project of mine, http://xml-qstat.org has a new website and (finally!) an accessible SVN code repository for downloading the package. There are still things (such as support for IE browsers) that I’d like to add before a real 1.0 release though. Truth be told the real reason for this post was to have an initial article tagged with the phrase ’xml-qstat’. The beautiful Typo-powered publishing engine running this website can dynamically construct RSS and ATOM syndication feeds based on any article category or tag. Creating the xmlqstat tag and posting news under it results in a quick and dirty way to always have an updated xml-qstat news RSS feed without having to code such features into the xml-qstat.org website.
xml-qstat is an attempt to do something useful with the XML status information that Grid Engine is now able to produce. At it’s heart, xml-qstat consists of a collection of stylesheets written in XSL. The stylesheets can be used with a XSLT transformation engine to change raw Grid Engine XML data into convenient formats such as XHTML and RSS. Once the grid data has been manipulated into XHTML we can then apply other web technologies such as CSS, DHTML and JavaScript to create fairly sophisticated web based tools for Grid Engine status reporting and monitoring. The Apache Cocoon framework supplies the XML transformation and web publishing engine.
MacOS X Desktop Widget for Grid Engine
More info is available at http://bioteam.net/sgeqstat
http://xml-qstat.bioteam.net
Idle time on MacOSX/Darwin
We use ioreg to ask the kernel information about the IOHIDSystem (Input Output Human INterface Device system).. Then grab the HIDIdleTime line and divide it by 1000000000 to get it into seconds.Link to Beth’s mail.
Here is the SED version (all one line)
echo $((`ioreg -c IOHIDSystem | sed -e ‘/HIDIdleTime/!{ d’ -e ‘t’ -e ‘}’ -e ‘s/.* = //g’ -e ‘q’` / 1000000000))Here is the perl version:
ioreg -c IOHIDSystem | perl -ane ‘if(/Idle/) {$idle=(pop @F)/1000000000; print $idle, “\n”; last;}’Here is the AWK version:
ioreg -c IOHIDSystem | awk ‘/HIDIdleTime/ {print $NF/1000000000; exit}’”


XML Feeds