Screenshots of enhanced Olesen FLEXlm tools in action

Posted by chris Thu, 06 Mar 2008 14:21:00 GMT

In a follow-up post to Mark's recent announcement we've gotten our hands on some screenshots from Mark showing his tools in use. The screenshots show the results of using XSLT transformations to turn Grid Engine XML data into XHTML form suitable for web pages. The benefit includes web-based visibility into current resource (and software license!) usage. This is exactly the approach that I tried out with the xml-qstat project. Mark is pretty familiar with that effort and will be merging his improvements and enhancements into xml-qstat's SVN repository. Speaking personally as a "scratch an itch" programmer with no real software engineering skill or talent I'm pretty excited to have a real coder take a look at xml-qstat. Related to that I already owe a debt to Petr Jung from Sun who contributed the Java based CommandGenerator code that finally allows xml-qstat to be a 100% Java/Cocoon web application that does not require external perl daemons to cache XML state data.

Before the screen captures, I'd like to ask a favor of people who read this blog. I filed bug Issue #2335 back in July of 2007 and it has not received much love (or even a targeted milestone date for a fix). The bug is a simple one -- "qstat -f -xml" no longer reports load average data which (a) makes xml-qstat a whole lot less useful and (b) breaks the SGE developer philosophy of ensuring that command output returns the same information regardless of output format. Until that bug is fixed it does make sense for xml-qstat to have it's long overdue "1.0" release. If you have a user account over on http://gridengine.sunsource.net I'd appreciate it if you can cast one of your "votes" for Issue 2335. Thanks!

And now the screenshots (edited to mask out personal/company information). Click on each image for a larger version.

qhost overview

Click on through for the rest of the pictures ...

qstat full view (a)

qstat full view (b)

qstat queue summary

qstat resource summary

qstat view

qstat kung fu

Posted by chris Mon, 04 Feb 2008 22:45:36 GMT

A user posted to the list looking for an efficient way of probing the designated output directory of active (pending or running) jobs.

Once again, Reuti comes up with a nice suggestion, this time employing a shell one-liner that pipes the output of a wildcard "qstat -j '*'" query through awk:

$ qstat -j "*" | awk ' /^job_number:/ { job_number=$2 } /^sge_o_workdir:/  \
{ print job_number, $2 } '

And like the original poster mentioned, I also had no idea that wildcards could be used with the "-j" option to qstat. Thanks Reuti!

qstat XML schema documentation

Posted by chris Tue, 28 Feb 2006 23:00:24 GMT

Grid Engine 6.x distributions include a "util/resources/schemas/qstat/" directory that currently contains the following files:

  • qstat.xsd
  • message.xsd
  • detailed_job_info.xsd

These are about the best resources one can currently obtain when delving deep into SGE's XML output behavior. They are, however, a bit cryptic to read. Passing the .xsd files through an XML Schema Documentation Generator has resulted in some more human readable output. The translated files can be found here:

sorting qstat output

Posted by chris Sun, 12 Feb 2006 21:07:42 GMT

A user recently asked the mailing list for suggestions on sorting the full output of qstat by job start time.

Reuti replied back with a link to his most excellent script, a bash script called "status" that makes heavy use of awk under the hood. The script works with both SGE 5.3 and 6.x versions of qstat.

The script is hosted on the download section of the SGE project website:
http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&showInfo=true

After downloading the script, usage is trivial. To sort output by job start time one would do:

 ./status -s time -a
Running jobs:
job-ID  # name                      owner      start time          running in
-----------------------------------------------------------------------------
   561  1 Job7458                   www        01/08/2006 18:59:05 all.q      (stalled)
   653  1 A11510113941883           www        02/08/2006 09:13:58 all.q      
   657  1 A11541113941889           www        02/08/2006 09:14:54 all.q      

Waiting jobs:
job-ID  # name                      owner      submit time        
------------------------------------------------------------------
   562  1 Job7458.cleanup           www        01/08/2006 17:38:14 (hold)
   654  1 btpymol                   www        02/08/2006 09:13:59 (Error)
   654  1 btpymol                   www        02/08/2006 09:13:59 (Error)
   655  1 merge                     www        02/08/2006 09:13:59 (hold)
   656  1 cleanup                   www        02/08/2006 09:13:59 (hold)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   659  1 merge                     www        02/08/2006 09:14:55 (hold)
   660  1 cleanup                   www        02/08/2006 09:14:55 (hold)
   407  1 impossibleJob             www        11/28/2005 09:58:42 

Building XML bindings for qstat

Posted by DanT Thu, 17 Nov 2005 19:58:00 GMT

This topic is one which has been under discussion for some time now. The basic idea is that using the JAXB RI from the JWSDP, we could build a set of classes which would parse qstat output, making it trivial for a developer to write an app which keeps tabs on Grid Engine. I believe the final decision was that we would not officially include such classes with Grid Engine for supportability reasons. Instead, I’m going to explain to you how to build the classes yourself. If you’re too lazy to follow these instructions, here’s a tarball of the classes I generated while writing this post, along with the source, etc.

The first thing you need to do is make sure you have the JavaTM platform, the latest JWSDP, Ant, and at least Grid Engine 6.0u7 (or an equivalent maintrunk source build) installed.

I am going to refer to several directories in this tutorial. They are:

$JWSDP_HOMERoot of JWSDP install
$JAXB_HOME$JWSDP_HOME/jaxb
$SGE_ROOTRoot of Grid Engine install
$SCHEMA_HOME$SGE_ROOT/util/resources/schemas/qstat
$BIND_HOMEDiretory where you’ll generate the classes

Note that you don’t need to have the above set as environment variables in your shell, but if you do, you can just copy and paste commands from this tutorial.

The first step is to create an external bindings file. This file will provide the JAXB class generator with some additional information. Specifically, the external bindings file will 1) assign a package to the generated files and 2) fix a naming conflict in the qstat.xsd schema. Create a file called $SCHEMA_HOME/qstat.xjb with the following contents:

<?xml version="1.0" encoding="UTF-8"?>

<jxb:bindings version="1.0"
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <jxb:bindings schemaLocation="qstat.xsd" node="/xs:schema">
      <jxb:bindings node="//xs:complexType[@name='job_list_t']">
         <jxb:bindings node="//xs:attribute[@name='state']">
            <jxb:property name="stateAttribute"/>
         </jxb:bindings>
      </jxb:bindings>
      <jxb:schemaBindings>
         <jxb:package name="com.sun.grid.xml.qstat"/>
      </jxb:schemaBindings>
   </jxb:bindings>
</jxb:bindings>

This file sets the package to com.sun.grid.xml.qstat, but you can set the package to whatever suits you. Keep in mind, though, that you may want to generate a set of bindings for each of the three schemas, and you don’t want them to overlap.

Next, you’ll generate the binding classes. You do that by running:

% $JAXB_HOME/bin/xjc.sh -d $BIND_HOME -b $SCHEMA_HOME/qstat.xjb $SCHEMA_HOME/qstat.xsd

If all went well, you’ll see a list of the classes that are generated. Congratulations! You now have a qstat XML binding!

What do you do with it? Well, let me get you started. First, for convenience, let’s create an Ant build script to compile the binding and run a sample app. Create a file called $BIND_HOME/build.xml with the following contents:

<?xml version="1.0" standalone="yes"?>
<project basedir="." default="compile">
  <path id="classpath">
    

  <!--compile Java source files-->
  <target name="compile" description="Compile all Java source files">
    <echo message="Compiling the java source files..." />
    <javac destdir="." debug="on">
      <src path="." />
      <classpath refid="classpath" />
    </javac>
  </target>

  <target name="run" depends="compile" description="Run the sample app">
    <echo message="Running the sample application..." />
    <java classname="Main" fork="true">
      <classpath refid="classpath" />
    </java>
  </target>
</project>

This build script is very primative. With any amount of Ant skills, you should be able to write something that better suits your needs. My goal here is only to cover the bare necessities.

Now, let’s create the sample app. Let’s write a simple app that lists the job number of name of all jobs currently in the system, called $BIND_HOME/Main.java. It might look something like this:

import java.util.*;
import com.sun.grid.*;
import javax.xml.bind.*;

public class Main {
    
   public static void main (String[] args) throws Exception {
      // Create a JAXB context
      JAXBContext jc = JAXBContext.newInstance ("com.sun.grid.xml.qstat");
            
      // Use the context to create an Unmarshaller
      Unmarshaller u = jc.createUnmarshaller();
            
      // Fork a qstat -xml
      Process p = Runtime.getRuntime ().exec ("qstat -xml");

      // Let the binding do it's magic
      JobInfo ji = (JobInfo)u.unmarshal (p.getInputStream ());
      List list = ((JobInfoT)ji.getJobInfo ().get (0)).getJobList ();
      Iterator i = list.iterator ();

      while (i.hasNext ()) {
         JobListT jlt = (JobListT)i.next ();

         System.out.println (jlt.getJBJobNumber () + ": " + jlt.getJBName ());
      }
   }
}

In the call to JAXBContext.netInstance(), we specify the package we used to generate the binding. What we get back from the call to Unmarshaller.unmarshal() is an object tree derived from the qstat output. We then walk the object tree until we get to the job list, and then we go through the jobs, printing the number and name for each one.

Clearly, the only way you will know what the object tree looks like is to 1) read the schema, 2) read the generated source files, or 3) generate JavaDocs from the generated source files. 3 is the best option, but 2 is what I actually did.

To build and run this sample app, do the following:

% ant -Djwsdp.home=$JWSDP_HOME build
% ant -Djwsdp.home=$JWSDP_HOME run

If you want to parse the output from qstat -j, you will need to repeat this process with the $SCHEMA_HOME/detailed_job_info.xsd schema. When processing this schema you will need to change the external bindings file a little. You’ll want a file called $SCHEMA_HOME/detailed_job_info.xjb that contains:

<?xml version="1.0" encoding="UTF-8"?>

<jxb:bindings version="1.0"
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <jxb:bindings schemaLocation="detailed_job_info.xsd" node="/xs:schema">
      <jxb:schemaBindings>
         <jxb:package name="com.sun.grid.xml.job"/>
      </jxb:schemaBindings>
   </jxb:bindings>
</jxb:bindings>

Again, feel free to change the package. You’d use this binding the same way you use the other one. There is a third schema in the $SCHEMA_HOME directory, message.xsd, that you won’t likely need, but if you do, you can generate a binding for it just like you did for detailed_job_info.xsd

Clearly, what I have provided here is only a starting point. For more information about using JAXB, see the docs included with the JWSDP download. Particularly useful are the examples. An obvious next step would be to extend the build script to generate JavaDocs and to split the class files out of the source tree. Another good next step would be to customize the binding classes so that, for example, the status code gets returned as a list of strings instead of a binary or’ed int. If you have problems, let me know. This tutorial is still a work in progress, so feedback is welcome.

gridengine XML: translating JAT_state values into useful information

Posted by chris Thu, 03 Nov 2005 17:57:48 GMT

This is going to be one of those posts that will be completely boring and uninteresting to most (if not all) people reading it. It may, however, someday and somehow, be of use to some poor soul googling for info on what those digits mean in the JAT_state element when dealing with qstat XML output. It also has scary implications for me since I have no idea how to handle bitmask operations inside XSL stylesheets.

A user parsing XML output from "qstat" posted a query to the dev list asking for information on interpreting the various integers such as "128" and "2112" he was seeing as values for the JAT_state XML element. By way of explanation, "JAT" in this scenario means "Job Array Task".

The answer is short, but needs lots of explanation and accompanying data. It turns out that the decimal values seen in JAT_state are "the SUM of all applicable JAT bitmask status codes".

For a listing of JAT-applicable bitmask status values and the stunning conclusion where the real meaning of JAT_state=2112 is finally revealed please read on...

The bitmasks used for JAT_state are:

   JHELD                   0x00000010
   JQUEUED                 0x00000040
   JWAITING                0x00000800
   JRUNNING                0x00000080
   JSUSPENDED              0x00000100
   JSUSPENDED_ON_THRESHOLD 0x00010000
   JERROR                  0x00008000

Translated into decimal form (which is what XML qstat output contains) the values are:

  JHELD:                   16
  JQUEUED:                 64
  JWAITING:                2048
  JRUNNING:                128
  JSUSPENDED:              256
  JSUSPENDED_ON_THRESHOLD: 65536
  JERROR:                  32768

So, when qstat XML produces JAT_state=128 we know that this means the job is running (state "r" in the human readable qstat output). We also know that the bitmasks are ADDED to account for multiple applicable states in an efficient manner. This means that the user reported value of "JAT_state=2112" can be broken down into JQUEUED+JWAITING because 2048+128=2112.

The states "queued + waiting" translate into the familiar "qw" state that is known to all Grid Engine users who use qstat on the command-line.

Commentary: This frightens me because I am lazy and not a good software engineer. heh. I understand how useful bitmasks are for software, the sum of any bitmask value will be unique which allows Grid Engine to rapidly and efficiently store and compute upon various status and states. The problem for me comes down to this: When faced with JAT_state=(some integer) how do I decompose that integer back into useful human-readable information about the relevant state or states? This is easy when a single bitmask is used but when the value is a SUM of a bunch of bitmasks it will be harder. I'll probably take the lazy way out and keep a lookup table of common sums (like 2112='qw'). Anyone have any better ideas? How would one handle this in the context of an XSL styleheet that is supposed to translate qstat XML into XHTML, text or PDF form?