SGE XML output getting some needed attention

Posted by chris Mon, 12 May 2008 16:14:21 GMT

For people like myself who are interested (or say, dependent) on the XML output features of Grid Engine it's been a lonely time. This area of Grid Engine was not really getting much love, attention or bug fixes until recently.

Happy to report that this seems to have changed. If you are at all interested in using SGE data in XML form then you may want to:

Kudos to Michael Pospisil from the Sun Microsystems SGE developer team in Prague for soliciting and listening to community input -- looks like the change may be bigger than simple bug fixes and output normalization. There is some talk about making XML output more usable to the end-users instead of the current design where XML output is largely a straight representation of internal SGE Cull lists and data structures.

Screenshots of enhanced Olesen FLEXlm tools in action

Posted by chris Thu, 06 Mar 2008 14:21:00 GMT

In a follow-up post to Mark's recent announcement we've gotten our hands on some screenshots from Mark showing his tools in use. The screenshots show the results of using XSLT transformations to turn Grid Engine XML data into XHTML form suitable for web pages. The benefit includes web-based visibility into current resource (and software license!) usage. This is exactly the approach that I tried out with the xml-qstat project. Mark is pretty familiar with that effort and will be merging his improvements and enhancements into xml-qstat's SVN repository. Speaking personally as a "scratch an itch" programmer with no real software engineering skill or talent I'm pretty excited to have a real coder take a look at xml-qstat. Related to that I already owe a debt to Petr Jung from Sun who contributed the Java based CommandGenerator code that finally allows xml-qstat to be a 100% Java/Cocoon web application that does not require external perl daemons to cache XML state data.

Before the screen captures, I'd like to ask a favor of people who read this blog. I filed bug Issue #2335 back in July of 2007 and it has not received much love (or even a targeted milestone date for a fix). The bug is a simple one -- "qstat -f -xml" no longer reports load average data which (a) makes xml-qstat a whole lot less useful and (b) breaks the SGE developer philosophy of ensuring that command output returns the same information regardless of output format. Until that bug is fixed it does make sense for xml-qstat to have it's long overdue "1.0" release. If you have a user account over on http://gridengine.sunsource.net I'd appreciate it if you can cast one of your "votes" for Issue 2335. Thanks!

And now the screenshots (edited to mask out personal/company information). Click on each image for a larger version.

qhost overview

Click on through for the rest of the pictures ...

qstat full view (a)

qstat full view (b)

qstat queue summary

qstat resource summary

qstat view

public SVN and a new website for xml-qstat

Posted by chris Sun, 14 May 2006 21:24:00 GMT

A side project of mine, http://xml-qstat.org has a new website and (finally!) an accessible SVN code repository for downloading the package. There are still things (such as support for IE browsers) that I’d like to add before a real 1.0 release though. Truth be told the real reason for this post was to have an initial article tagged with the phrase ’xml-qstat’. The beautiful Typo-powered publishing engine running this website can dynamically construct RSS and ATOM syndication feeds based on any article category or tag. Creating the xmlqstat tag and posting news under it results in a quick and dirty way to always have an updated xml-qstat news RSS feed without having to code such features into the xml-qstat.org website.

* *

xml-qstat is an attempt to do something useful with the XML status information that Grid Engine is now able to produce. At it’s heart, xml-qstat consists of a collection of stylesheets written in XSL. The stylesheets can be used with a XSLT transformation engine to change raw Grid Engine XML data into convenient formats such as XHTML and RSS. Once the grid data has been manipulated into XHTML we can then apply other web technologies such as CSS, DHTML and JavaScript to create fairly sophisticated web based tools for Grid Engine status reporting and monitoring. The Apache Cocoon framework supplies the XML transformation and web publishing engine.

Passau Java qstat API version 0.2 is available

Posted by chris Tue, 09 May 2006 17:28:51 GMT

DanT writes:

"...I just published the 0.2 distro of Project Passau on Java.net. Passau is a Java API for accessing the information provided by qstat. It uses the qstat -xml command to produce XML output and then parses that output using a JAXB binding."

qstat XML schema documentation

Posted by chris Tue, 28 Feb 2006 23:00:24 GMT

Grid Engine 6.x distributions include a "util/resources/schemas/qstat/" directory that currently contains the following files:

  • qstat.xsd
  • message.xsd
  • detailed_job_info.xsd

These are about the best resources one can currently obtain when delving deep into SGE's XML output behavior. They are, however, a bit cryptic to read. Passing the .xsd files through an XML Schema Documentation Generator has resulted in some more human readable output. The translated files can be found here:

Building XML bindings for qstat

Posted by DanT Thu, 17 Nov 2005 19:58:00 GMT

This topic is one which has been under discussion for some time now. The basic idea is that using the JAXB RI from the JWSDP, we could build a set of classes which would parse qstat output, making it trivial for a developer to write an app which keeps tabs on Grid Engine. I believe the final decision was that we would not officially include such classes with Grid Engine for supportability reasons. Instead, I’m going to explain to you how to build the classes yourself. If you’re too lazy to follow these instructions, here’s a tarball of the classes I generated while writing this post, along with the source, etc.

The first thing you need to do is make sure you have the JavaTM platform, the latest JWSDP, Ant, and at least Grid Engine 6.0u7 (or an equivalent maintrunk source build) installed.

I am going to refer to several directories in this tutorial. They are:

$JWSDP_HOMERoot of JWSDP install
$JAXB_HOME$JWSDP_HOME/jaxb
$SGE_ROOTRoot of Grid Engine install
$SCHEMA_HOME$SGE_ROOT/util/resources/schemas/qstat
$BIND_HOMEDiretory where you’ll generate the classes

Note that you don’t need to have the above set as environment variables in your shell, but if you do, you can just copy and paste commands from this tutorial.

The first step is to create an external bindings file. This file will provide the JAXB class generator with some additional information. Specifically, the external bindings file will 1) assign a package to the generated files and 2) fix a naming conflict in the qstat.xsd schema. Create a file called $SCHEMA_HOME/qstat.xjb with the following contents:

<?xml version="1.0" encoding="UTF-8"?>

<jxb:bindings version="1.0"
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <jxb:bindings schemaLocation="qstat.xsd" node="/xs:schema">
      <jxb:bindings node="//xs:complexType[@name='job_list_t']">
         <jxb:bindings node="//xs:attribute[@name='state']">
            <jxb:property name="stateAttribute"/>
         </jxb:bindings>
      </jxb:bindings>
      <jxb:schemaBindings>
         <jxb:package name="com.sun.grid.xml.qstat"/>
      </jxb:schemaBindings>
   </jxb:bindings>
</jxb:bindings>

This file sets the package to com.sun.grid.xml.qstat, but you can set the package to whatever suits you. Keep in mind, though, that you may want to generate a set of bindings for each of the three schemas, and you don’t want them to overlap.

Next, you’ll generate the binding classes. You do that by running:

% $JAXB_HOME/bin/xjc.sh -d $BIND_HOME -b $SCHEMA_HOME/qstat.xjb $SCHEMA_HOME/qstat.xsd

If all went well, you’ll see a list of the classes that are generated. Congratulations! You now have a qstat XML binding!

What do you do with it? Well, let me get you started. First, for convenience, let’s create an Ant build script to compile the binding and run a sample app. Create a file called $BIND_HOME/build.xml with the following contents:

<?xml version="1.0" standalone="yes"?>
<project basedir="." default="compile">
  <path id="classpath">
    

  <!--compile Java source files-->
  <target name="compile" description="Compile all Java source files">
    <echo message="Compiling the java source files..." />
    <javac destdir="." debug="on">
      <src path="." />
      <classpath refid="classpath" />
    </javac>
  </target>

  <target name="run" depends="compile" description="Run the sample app">
    <echo message="Running the sample application..." />
    <java classname="Main" fork="true">
      <classpath refid="classpath" />
    </java>
  </target>
</project>

This build script is very primative. With any amount of Ant skills, you should be able to write something that better suits your needs. My goal here is only to cover the bare necessities.

Now, let’s create the sample app. Let’s write a simple app that lists the job number of name of all jobs currently in the system, called $BIND_HOME/Main.java. It might look something like this:

import java.util.*;
import com.sun.grid.*;
import javax.xml.bind.*;

public class Main {
    
   public static void main (String[] args) throws Exception {
      // Create a JAXB context
      JAXBContext jc = JAXBContext.newInstance ("com.sun.grid.xml.qstat");
            
      // Use the context to create an Unmarshaller
      Unmarshaller u = jc.createUnmarshaller();
            
      // Fork a qstat -xml
      Process p = Runtime.getRuntime ().exec ("qstat -xml");

      // Let the binding do it's magic
      JobInfo ji = (JobInfo)u.unmarshal (p.getInputStream ());
      List list = ((JobInfoT)ji.getJobInfo ().get (0)).getJobList ();
      Iterator i = list.iterator ();

      while (i.hasNext ()) {
         JobListT jlt = (JobListT)i.next ();

         System.out.println (jlt.getJBJobNumber () + ": " + jlt.getJBName ());
      }
   }
}

In the call to JAXBContext.netInstance(), we specify the package we used to generate the binding. What we get back from the call to Unmarshaller.unmarshal() is an object tree derived from the qstat output. We then walk the object tree until we get to the job list, and then we go through the jobs, printing the number and name for each one.

Clearly, the only way you will know what the object tree looks like is to 1) read the schema, 2) read the generated source files, or 3) generate JavaDocs from the generated source files. 3 is the best option, but 2 is what I actually did.

To build and run this sample app, do the following:

% ant -Djwsdp.home=$JWSDP_HOME build
% ant -Djwsdp.home=$JWSDP_HOME run

If you want to parse the output from qstat -j, you will need to repeat this process with the $SCHEMA_HOME/detailed_job_info.xsd schema. When processing this schema you will need to change the external bindings file a little. You’ll want a file called $SCHEMA_HOME/detailed_job_info.xjb that contains:

<?xml version="1.0" encoding="UTF-8"?>

<jxb:bindings version="1.0"
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <jxb:bindings schemaLocation="detailed_job_info.xsd" node="/xs:schema">
      <jxb:schemaBindings>
         <jxb:package name="com.sun.grid.xml.job"/>
      </jxb:schemaBindings>
   </jxb:bindings>
</jxb:bindings>

Again, feel free to change the package. You’d use this binding the same way you use the other one. There is a third schema in the $SCHEMA_HOME directory, message.xsd, that you won’t likely need, but if you do, you can generate a binding for it just like you did for detailed_job_info.xsd

Clearly, what I have provided here is only a starting point. For more information about using JAXB, see the docs included with the JWSDP download. Particularly useful are the examples. An obvious next step would be to extend the build script to generate JavaDocs and to split the class files out of the source tree. Another good next step would be to customize the binding classes so that, for example, the status code gets returned as a list of strings instead of a binary or’ed int. If you have problems, let me know. This tutorial is still a work in progress, so feedback is welcome.

Easy gridengine XML handling via Perl XML::Smart

Posted by chris Fri, 11 Nov 2005 19:45:23 GMT

Joe Landman from Scalable Informatics posted about his success with the Perl XML::Smart ( CPAN, readme, FAQ, tutorial) module.

Unlike many of the XML handling methods within the Perl universe, this module stands on its own without a huge and complicated chain of external dependencies.

XML::Smart can quickly and cleanly parse XML documents into perl datastructures that can efficiently traversed and sorted. This makes it a great method for simple perl scripts designed to grab bits of data or information that does not get displayed in the human-readble qstat output.

Joe's comments:
Our 6.0u6 perl based parser fits into a single line, after we grab the data.
	$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
	$xml    = XML::Smart->new($qstat);

(no schema/DTD needed)
then for example, iterating over all the jobs ...
	foreach ($xml->{job_info}->{queue_info}->{job_list}('@') )
	  {
	   ...
	  }

Using some example code (included at the end of this article by permission) kindly provided by Joe, I was able to whip up a little "just playing" script that checks all pending jobs for hard resource requests. When a hard request is found, the script simply prints out a line that lists the Job ID, Job Name and the value of the hard resource request. The script looks like this:

#!/usr/bin/perl -w
use XML::Smart;
my ($xml,$qstat);

$qstat=`/opt/sge6s2u1/bin/lx24-amd64/qstat -xml -r -f`;
$xml    = XML::Smart->new($qstat);

foreach ($xml->{job_info}->{job_info}->{job_list}('@') )
{
    if($_->{hard_request}) {
      print "Job ID $_->{JB_job_number} ($_->{JB_name}) has a hard_request: ";
      print "$_->{hard_request}{name}=$_->{hard_request} \n";
    }
}

Output looks like this:

[dag@dcore-amd ~]$ ./test.pl
Job ID 47 (impossibleJob) has a hard_request: arch=darwin 
[dag@dcore-amd ~]$ 

Additional pointers and examples from Scalable Informatics are included below ...


Scalable Informatics provided the following example code and explanations.


The included code is copyright (c) 2004-2005 Scalable Informatics and licensed under GPL 2

What we use today looks just like this:

use XML::Smart;
my ($xml,$qstat);

$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
$xml	= XML::Smart->new($qstat);

foreach ($xml->{job_info}->{queue_info}->{job_list}('@') )
  {
     # stuff with each job.  All the per job attributes are now available as
     # $_->{attribute_name}.
     #
  }

Now if you want to get fancy, and sort by *any* attribute (up or down, using JB_Owner in this case, refer to the XML for what you want to sort

use XML::Smart;
my ($xml,$qstat,@jobs);

$qstat=`/opt/gridengine/bin/lx24-amd64/qstat -xml`;
$xml	= XML::Smart->new($qstat);
@jobs   = $xml->{job_info}->{queue_info}->{job_list}('@');

foreach ( sort { $a->{JB_Owner} cmp  $b->{JB_Owner} } @jobs )
  {
     # stuff with each job.  All the per job attributes are now available as
     # $_->{attribute_name}.
     #
  }

To extract execution times requires a bit more work (need to parse 2 dates, subtract one from another, then return the value in a sensible format). Code to do that looks like this:

use Date::Manip;
my ($d,$t,$olddate,$delta,$dt,$date);

# ... some place later in the code ...
($d,$t)=split(/\s+/, $_->{JAT_start_time}  );
if ($d =~ /(\d+)\/(\d+)\/(\d+)/)  { $date = sprintf "%.4i%.2i%.2i",$3,$1,$2; }
       if ($t =~ /(\d+):(\d+):(\d+)/)  { $date .= sprintf "%i%i%i",$1,$2,$3; }
       $olddate = ParseDate($date );
$delta = DateCalc($olddate,$today);
      $dt = Delta_Format($delta,0,qw(%st));
       printf  "%.1f second(s)\n",$dt;

The issue in part is that SGE does not define an elapsed job runtime field somewhere, you need to calculate it. Hopefully this will change.

You can easily combine this into a program that grabs all the relevant data and outputs what you need. If you are using XSLT or similar, you could use this as a parser call-back.

The XML::Smart module is the recommended way to go with Perl. It is extremely fast and very flexible while also being very easy to use. Just don't peek too much at its internal data structures, they can be ... interesting. Note also that they can get huge. So if your xml is more than a few gigabytes in size, you might need to do a little extra work.

gridengine XML: translating JAT_state values into useful information

Posted by chris Thu, 03 Nov 2005 17:57:48 GMT

This is going to be one of those posts that will be completely boring and uninteresting to most (if not all) people reading it. It may, however, someday and somehow, be of use to some poor soul googling for info on what those digits mean in the JAT_state element when dealing with qstat XML output. It also has scary implications for me since I have no idea how to handle bitmask operations inside XSL stylesheets.

A user parsing XML output from "qstat" posted a query to the dev list asking for information on interpreting the various integers such as "128" and "2112" he was seeing as values for the JAT_state XML element. By way of explanation, "JAT" in this scenario means "Job Array Task".

The answer is short, but needs lots of explanation and accompanying data. It turns out that the decimal values seen in JAT_state are "the SUM of all applicable JAT bitmask status codes".

For a listing of JAT-applicable bitmask status values and the stunning conclusion where the real meaning of JAT_state=2112 is finally revealed please read on...

The bitmasks used for JAT_state are:

   JHELD                   0x00000010
   JQUEUED                 0x00000040
   JWAITING                0x00000800
   JRUNNING                0x00000080
   JSUSPENDED              0x00000100
   JSUSPENDED_ON_THRESHOLD 0x00010000
   JERROR                  0x00008000

Translated into decimal form (which is what XML qstat output contains) the values are:

  JHELD:                   16
  JQUEUED:                 64
  JWAITING:                2048
  JRUNNING:                128
  JSUSPENDED:              256
  JSUSPENDED_ON_THRESHOLD: 65536
  JERROR:                  32768

So, when qstat XML produces JAT_state=128 we know that this means the job is running (state "r" in the human readable qstat output). We also know that the bitmasks are ADDED to account for multiple applicable states in an efficient manner. This means that the user reported value of "JAT_state=2112" can be broken down into JQUEUED+JWAITING because 2048+128=2112.

The states "queued + waiting" translate into the familiar "qw" state that is known to all Grid Engine users who use qstat on the command-line.

Commentary: This frightens me because I am lazy and not a good software engineer. heh. I understand how useful bitmasks are for software, the sum of any bitmask value will be unique which allows Grid Engine to rapidly and efficiently store and compute upon various status and states. The problem for me comes down to this: When faced with JAT_state=(some integer) how do I decompose that integer back into useful human-readable information about the relevant state or states? This is easy when a single bitmask is used but when the value is a SUM of a bunch of bitmasks it will be harder. I'll probably take the lazy way out and keep a lookup table of common sums (like 2112='qw'). Anyone have any better ideas? How would one handle this in the context of an XSL styleheet that is supposed to translate qstat XML into XHTML, text or PDF form?