Screenshots of enhanced Olesen FLEXlm tools in action
In a follow-up post to Mark's recent announcement we've gotten our hands on some screenshots from Mark showing his tools in use. The screenshots show the results of using XSLT transformations to turn Grid Engine XML data into XHTML form suitable for web pages. The benefit includes web-based visibility into current resource (and software license!) usage. This is exactly the approach that I tried out with the xml-qstat project. Mark is pretty familiar with that effort and will be merging his improvements and enhancements into xml-qstat's SVN repository. Speaking personally as a "scratch an itch" programmer with no real software engineering skill or talent I'm pretty excited to have a real coder take a look at xml-qstat. Related to that I already owe a debt to Petr Jung from Sun who contributed the Java based CommandGenerator code that finally allows xml-qstat to be a 100% Java/Cocoon web application that does not require external perl daemons to cache XML state data.
Before the screen captures, I'd like to ask a favor of people who read this blog. I filed bug Issue #2335 back in July of 2007 and it has not received much love (or even a targeted milestone date for a fix). The bug is a simple one -- "qstat -f -xml" no longer reports load average data which (a) makes xml-qstat a whole lot less useful and (b) breaks the SGE developer philosophy of ensuring that command output returns the same information regardless of output format. Until that bug is fixed it does make sense for xml-qstat to have it's long overdue "1.0" release. If you have a user account over on http://gridengine.sunsource.net I'd appreciate it if you can cast one of your "votes" for Issue 2335. Thanks!
And now the screenshots (edited to mask out personal/company information). Click on each image for a larger version.
Click on through for the rest of the pictures ...
qstat kung fu
A user posted to the list looking for an efficient way of probing the designated output directory of active (pending or running) jobs.
Once again, Reuti comes up with a nice suggestion, this time employing a shell one-liner that pipes the output of a wildcard "qstat -j '*'" query through awk:
$ qstat -j "*" | awk ' /^job_number:/ { job_number=$2 } /^sge_o_workdir:/ \
{ print job_number, $2 } '
And like the original poster mentioned, I also had no idea that wildcards could be used with the "-j" option to qstat. Thanks Reuti!
qstat XML schema documentation
Grid Engine 6.x distributions include a "util/resources/schemas/qstat/" directory that currently contains the following files:
- qstat.xsd
- message.xsd
- detailed_job_info.xsd
These are about the best resources one can currently obtain when delving deep into SGE's XML output behavior. They are, however, a bit cryptic to read. Passing the .xsd files through an XML Schema Documentation Generator has resulted in some more human readable output. The translated files can be found here:
sorting qstat output
A user recently asked the mailing list for suggestions on sorting the full output of qstat by job start time.
Reuti replied back with a link to his most excellent script, a bash script called "status" that makes heavy use of awk under the hood. The script works with both SGE 5.3 and 6.x versions of qstat.
The script is hosted on the download section of the SGE project website:
http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&showInfo=true
After downloading the script, usage is trivial. To sort output by job start time one would do:
./status -s time -a Running jobs: job-ID # name owner start time running in ----------------------------------------------------------------------------- 561 1 Job7458 www 01/08/2006 18:59:05 all.q (stalled) 653 1 A11510113941883 www 02/08/2006 09:13:58 all.q 657 1 A11541113941889 www 02/08/2006 09:14:54 all.q Waiting jobs: job-ID # name owner submit time ------------------------------------------------------------------ 562 1 Job7458.cleanup www 01/08/2006 17:38:14 (hold) 654 1 btpymol www 02/08/2006 09:13:59 (Error) 654 1 btpymol www 02/08/2006 09:13:59 (Error) 655 1 merge www 02/08/2006 09:13:59 (hold) 656 1 cleanup www 02/08/2006 09:13:59 (hold) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 659 1 merge www 02/08/2006 09:14:55 (hold) 660 1 cleanup www 02/08/2006 09:14:55 (hold) 407 1 impossibleJob www 11/28/2005 09:58:42
Building XML bindings for qstat
This topic is one which has been under discussion for some time now. The basic idea is that using the JAXB RI from the JWSDP, we could build a set of classes which would parse qstat output, making it trivial for a developer to write an app which keeps tabs on Grid Engine. I believe the final decision was that we would not officially include such classes with Grid Engine for supportability reasons. Instead, I’m going to explain to you how to build the classes yourself. If you’re too lazy to follow these instructions, here’s a tarball of the classes I generated while writing this post, along with the source, etc.
The first thing you need to do is make sure you have the JavaTM platform, the latest JWSDP, Ant, and at least Grid Engine 6.0u7 (or an equivalent maintrunk source build) installed.
I am going to refer to several directories in this tutorial. They are:
| $JWSDP_HOME | Root of JWSDP install |
| $JAXB_HOME | $JWSDP_HOME/jaxb |
| $SGE_ROOT | Root of Grid Engine install |
| $SCHEMA_HOME | $SGE_ROOT/util/resources/schemas/qstat |
| $BIND_HOME | Diretory where you’ll generate the classes |
Note that you don’t need to have the above set as environment variables in your shell, but if you do, you can just copy and paste commands from this tutorial.
The first step is to create an external bindings file. This file will provide the JAXB class generator with some additional information. Specifically, the external bindings file will 1) assign a package to the generated files and 2) fix a naming conflict in the qstat.xsd schema. Create a file called $SCHEMA_HOME/qstat.xjb with the following contents:
<?xml version="1.0" encoding="UTF-8"?>
<jxb:bindings version="1.0"
xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<jxb:bindings schemaLocation="qstat.xsd" node="/xs:schema">
<jxb:bindings node="//xs:complexType[@name='job_list_t']">
<jxb:bindings node="//xs:attribute[@name='state']">
<jxb:property name="stateAttribute"/>
</jxb:bindings>
</jxb:bindings>
<jxb:schemaBindings>
<jxb:package name="com.sun.grid.xml.qstat"/>
</jxb:schemaBindings>
</jxb:bindings>
</jxb:bindings>
|
This file sets the package to com.sun.grid.xml.qstat, but you can set the package to whatever suits you. Keep in mind, though, that you may want to generate a set of bindings for each of the three schemas, and you don’t want them to overlap.
Next, you’ll generate the binding classes. You do that by running:
% $JAXB_HOME/bin/xjc.sh -d $BIND_HOME -b $SCHEMA_HOME/qstat.xjb $SCHEMA_HOME/qstat.xsd |
If all went well, you’ll see a list of the classes that are generated. Congratulations! You now have a qstat XML binding!
What do you do with it? Well, let me get you started. First, for convenience, let’s create an Ant build script to compile the binding and run a sample app. Create a file called $BIND_HOME/build.xml with the following contents:
<?xml version="1.0" standalone="yes"?>
<project basedir="." default="compile">
<path id="classpath">
|
This build script is very primative. With any amount of Ant skills, you should be able to write something that better suits your needs. My goal here is only to cover the bare necessities.
Now, let’s create the sample app. Let’s write a simple app that lists the job number of name of all jobs currently in the system, called $BIND_HOME/Main.java. It might look something like this:
import java.util.*;
import com.sun.grid.*;
import javax.xml.bind.*;
public class Main {
public static void main (String[] args) throws Exception {
// Create a JAXB context
JAXBContext jc = JAXBContext.newInstance ("com.sun.grid.xml.qstat");
// Use the context to create an Unmarshaller
Unmarshaller u = jc.createUnmarshaller();
// Fork a qstat -xml
Process p = Runtime.getRuntime ().exec ("qstat -xml");
// Let the binding do it's magic
JobInfo ji = (JobInfo)u.unmarshal (p.getInputStream ());
List list = ((JobInfoT)ji.getJobInfo ().get (0)).getJobList ();
Iterator i = list.iterator ();
while (i.hasNext ()) {
JobListT jlt = (JobListT)i.next ();
System.out.println (jlt.getJBJobNumber () + ": " + jlt.getJBName ());
}
}
}
|
In the call to JAXBContext.netInstance(), we specify the package we used to generate the binding. What we get back from the call to Unmarshaller.unmarshal() is an object tree derived from the qstat output. We then walk the object tree until we get to the job list, and then we go through the jobs, printing the number and name for each one.
Clearly, the only way you will know what the object tree looks like is to 1) read the schema, 2) read the generated source files, or 3) generate JavaDocs from the generated source files. 3 is the best option, but 2 is what I actually did.
To build and run this sample app, do the following:
% ant -Djwsdp.home=$JWSDP_HOME build % ant -Djwsdp.home=$JWSDP_HOME run |
If you want to parse the output from qstat -j, you will need to repeat this process with the $SCHEMA_HOME/detailed_job_info.xsd schema. When processing this schema you will need to change the external bindings file a little. You’ll want a file called $SCHEMA_HOME/detailed_job_info.xjb that contains:
<?xml version="1.0" encoding="UTF-8"?>
<jxb:bindings version="1.0"
xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<jxb:bindings schemaLocation="detailed_job_info.xsd" node="/xs:schema">
<jxb:schemaBindings>
<jxb:package name="com.sun.grid.xml.job"/>
</jxb:schemaBindings>
</jxb:bindings>
</jxb:bindings>
|
Again, feel free to change the package. You’d use this binding the same way you use the other one. There is a third schema in the $SCHEMA_HOME directory, message.xsd, that you won’t likely need, but if you do, you can generate a binding for it just like you did for detailed_job_info.xsd
Clearly, what I have provided here is only a starting point. For more information about using JAXB, see the docs included with the JWSDP download. Particularly useful are the examples. An obvious next step would be to extend the build script to generate JavaDocs and to split the class files out of the source tree. Another good next step would be to customize the binding classes so that, for example, the status code gets returned as a list of strings instead of a binary or’ed int. If you have problems, let me know. This tutorial is still a work in progress, so feedback is welcome.
gridengine XML: translating JAT_state values into useful information
This is going to be one of those posts that will be completely boring and uninteresting to most (if not all) people reading it. It may, however, someday and somehow, be of use to some poor soul googling for info on what those digits mean in the JAT_state element when dealing with qstat XML output. It also has scary implications for me since I have no idea how to handle bitmask operations inside XSL stylesheets.
A user parsing XML output from "qstat" posted a query to the dev list asking for information on interpreting the various integers such as "128" and "2112" he was seeing as values for the JAT_state XML element. By way of explanation, "JAT" in this scenario means "Job Array Task".
The answer is short, but needs lots of explanation and accompanying data. It turns out that the decimal values seen in JAT_state are "the SUM of all applicable JAT bitmask status codes".
For a listing of JAT-applicable bitmask status values and the stunning conclusion where the real meaning of JAT_state=2112 is finally revealed please read on...
The bitmasks used for JAT_state are:
JHELD 0x00000010 JQUEUED 0x00000040 JWAITING 0x00000800 JRUNNING 0x00000080 JSUSPENDED 0x00000100 JSUSPENDED_ON_THRESHOLD 0x00010000 JERROR 0x00008000
Translated into decimal form (which is what XML qstat output contains) the values are:
JHELD: 16 JQUEUED: 64 JWAITING: 2048 JRUNNING: 128 JSUSPENDED: 256 JSUSPENDED_ON_THRESHOLD: 65536 JERROR: 32768
So, when qstat XML produces JAT_state=128 we know that this means the job is running (state "r" in the human readable qstat output). We also know that the bitmasks are ADDED to account for multiple applicable states in an efficient manner. This means that the user reported value of "JAT_state=2112" can be broken down into JQUEUED+JWAITING because 2048+128=2112.
The states "queued + waiting" translate into the familiar "qw" state that is known to all Grid Engine users who use qstat on the command-line.
Commentary: This frightens me because I am lazy and not a good software engineer. heh. I understand how useful bitmasks are for software, the sum of any bitmask value will be unique which allows Grid Engine to rapidly and efficiently store and compute upon various status and states. The problem for me comes down to this: When faced with JAT_state=(some integer) how do I decompose that integer back into useful human-readable information about the relevant state or states? This is easy when a single bitmask is used but when the value is a SUM of a bunch of bitmasks it will be harder. I'll probably take the lazy way out and keep a lookup table of common sums (like 2112='qw'). Anyone have any better ideas? How would one handle this in the context of an XSL styleheet that is supposed to translate qstat XML into XHTML, text or PDF form?







XML Feeds