gridengine XML: translating JAT_state values into useful information
This is going to be one of those posts that will be completely boring and uninteresting to most (if not all) people reading it. It may, however, someday and somehow, be of use to some poor soul googling for info on what those digits mean in the JAT_state element when dealing with qstat XML output. It also has scary implications for me since I have no idea how to handle bitmask operations inside XSL stylesheets.
A user parsing XML output from "qstat" posted a query to the dev list asking for information on interpreting the various integers such as "128" and "2112" he was seeing as values for the JAT_state XML element. By way of explanation, "JAT" in this scenario means "Job Array Task".
The answer is short, but needs lots of explanation and accompanying data. It turns out that the decimal values seen in JAT_state are "the SUM of all applicable JAT bitmask status codes".
For a listing of JAT-applicable bitmask status values and the stunning conclusion where the real meaning of JAT_state=2112 is finally revealed please read on...
The bitmasks used for JAT_state are:
JHELD 0x00000010 JQUEUED 0x00000040 JWAITING 0x00000800 JRUNNING 0x00000080 JSUSPENDED 0x00000100 JSUSPENDED_ON_THRESHOLD 0x00010000 JERROR 0x00008000
Translated into decimal form (which is what XML qstat output contains) the values are:
JHELD: 16 JQUEUED: 64 JWAITING: 2048 JRUNNING: 128 JSUSPENDED: 256 JSUSPENDED_ON_THRESHOLD: 65536 JERROR: 32768
So, when qstat XML produces JAT_state=128 we know that this means the job is running (state "r" in the human readable qstat output). We also know that the bitmasks are ADDED to account for multiple applicable states in an efficient manner. This means that the user reported value of "JAT_state=2112" can be broken down into JQUEUED+JWAITING because 2048+128=2112.
The states "queued + waiting" translate into the familiar "qw" state that is known to all Grid Engine users who use qstat on the command-line.
Commentary: This frightens me because I am lazy and not a good software engineer. heh. I understand how useful bitmasks are for software, the sum of any bitmask value will be unique which allows Grid Engine to rapidly and efficiently store and compute upon various status and states. The problem for me comes down to this: When faced with JAT_state=(some integer) how do I decompose that integer back into useful human-readable information about the relevant state or states? This is easy when a single bitmask is used but when the value is a SUM of a bunch of bitmasks it will be harder. I'll probably take the lazy way out and keep a lookup table of common sums (like 2112='qw'). Anyone have any better ideas? How would one handle this in the context of an XSL styleheet that is supposed to translate qstat XML into XHTML, text or PDF form?

XML Feeds