New user contributed accounting script
A new "pull statistics from the SGE accounting log file" script has been posted to the SGE community. Olivier Blondel took Joe Landman's "usage.pl" script and modified it to suit his own needs. The script can be found embedded inline with Olivier's post to the users mailing list.
Simple perl reporting tool for SGE accounting data
Joe at Scalable Informatics is offering up a "quick -n- simple" reporting script for Grid Engine accounting and usage data.
Usage examples:
[landman@minicc ~]$ ./usage.pl
Total usage: (in units of second(s))
wallclock : 46733.000 second(s)
user time : 1600.000 second(s) [3.42%]
system time: 17.000 second(s) [0.04%]
cpu time : 70379.000 second(s) [150.60%]
user wallclock user time system time cpu time
memory percent of total time
landman 46733.000 1600.000 17.000
70379.000 0.000 100.000
The script is available here http://downloads.scalableinformatics.com/downloads/gridengine/usage.pl
"job dropped because of user limitations"
Consider this snippet more search engine fodder for people web searching on particular error messages.
Recently a user asked on the mailing list about encountering job submission rejection messages that say:
job dropped because of user limitation
That particular rejection message is tied to 2 different configuration parameters that can be hard coded into grid engine:
max_u_jobsThe number of active (not finished) jobs which each Grid Engine user can have in the system simultaneously is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_u_jobs limit is exceeded by a job submission then the submission command exits with exit status 25 and an appropri- ate error message.max_jobs
The number of active (not finished) jobs simultaneously allowed in Grid Engine is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_jobs limit is exceeded by a job submission then the submission command exits with exit status 25 and an appropriate error message.
Commentary: There are certainly use cases for which these parameters are the best solution but ... before using either of them, consider if one of the SGE resource allocation policy mechanisms can accomplish the same goals. Hard coding global constraints on jobs can negatively affect flexibility and overall system utilization.
Reuti on Gaussian G03-D.01 Integration
This is another one of those short blog posts that serve mostly as index fodder for search engines. Hopefully this will be a shortcut for someone searching on Gaussian SGE integration. The SGE mailing lists are generally not indexed well by the various net crawlers.
Reuti has posted some comments and code snippets on Gaussian G03-D.01 Integration. The full message can be read here:
http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=14600
grouping jobs to nodes via wildcard PE's
Grid Engine 6 introduced a better resource request syntax, including use of the wildcard "*" character. Some people on the SGE mailing list have reporting using wildcard selectors on Parallel Environments to enforce some really interesting grouping behavior within the grid engine job scheduler. In effect, one of the things this method allows one to do is control the hostgroups to which parallel jobs of different sizes will be dispatched to.
Take this mailing list question as an example...
...We have a cluster composed of several "subclusters". Each subcluster has
8 nodes and is connected over a first switch to the master switch.
subcluster 1 subcluster 2 ...
n11 n12 n13 n14 n15 n16 n17 n18 n21 n22 n23 n24 n25 n26 n27 n28
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
------------------------------- -------------------------------
switch 1 switch 2
------------------------------- -------------------------------
| |
| |
----------------------------------------
master switch
----------------------------------------
|
|
-------------
master node
-------------
One of the applications running on the cluster needs 8 nodes. We want to
configure the queue (queues?) to allocate only a full subcluster to a
job and not to spawn over to another subcluster.Reuti provides a really slick solution ...
- Create a hostgroup for each subcluster
- Create a PE for each subcluster ('mpi_a' and 'mpi_b')
- Create 2 queues, each associated with a subcluster hostgroup and one of the newly create PE environments
- Submit jobs via: 'qsub -pe "mpi* 8"'
The end result is that parallel jobs will only land within one particular subcluster, keeping all network communication within a single switch (presumably the reason for the subcluster grouping in the first place).
Reuti goes on to explain how this can be used for grouping non-parallel jobs -- some reconfiguration of the queue sorting mechanism and sequence numbers will allow one subcluster be "filled" with serial jobs before job slots are used from the other subcluster (a wise move since this keeps the 2nd subcluster free for larger parallel jobs).
sorting qstat output
A user recently asked the mailing list for suggestions on sorting the full output of qstat by job start time.
Reuti replied back with a link to his most excellent script, a bash script called "status" that makes heavy use of awk under the hood. The script works with both SGE 5.3 and 6.x versions of qstat.
The script is hosted on the download section of the SGE project website:
http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&showInfo=true
After downloading the script, usage is trivial. To sort output by job start time one would do:
./status -s time -a Running jobs: job-ID # name owner start time running in ----------------------------------------------------------------------------- 561 1 Job7458 www 01/08/2006 18:59:05 all.q (stalled) 653 1 A11510113941883 www 02/08/2006 09:13:58 all.q 657 1 A11541113941889 www 02/08/2006 09:14:54 all.q Waiting jobs: job-ID # name owner submit time ------------------------------------------------------------------ 562 1 Job7458.cleanup www 01/08/2006 17:38:14 (hold) 654 1 btpymol www 02/08/2006 09:13:59 (Error) 654 1 btpymol www 02/08/2006 09:13:59 (Error) 655 1 merge www 02/08/2006 09:13:59 (hold) 656 1 cleanup www 02/08/2006 09:13:59 (hold) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 658 1 btrasmol www 02/08/2006 09:14:55 (Error) 659 1 merge www 02/08/2006 09:14:55 (hold) 660 1 cleanup www 02/08/2006 09:14:55 (hold) 407 1 impossibleJob www 11/28/2005 09:58:42
Easy setup of equal user fairshare policy
Reuti posted this link again on the users list and it really caught my eye. It really does represent the fastest/easiest way for an SGE admin to set up a basic resource allocation policy that shares resources equally (and automatically) among all users.
The link:
http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=8319
It boils down to 2 simple configuration actions:
- Make 2 changes in the main SGE configuration ('qconf -mconf'):
- enforce_user auto
- auto_user_fshare 100
- Make 1 change in the SGE scheduler configuration ('qconf -msconf'):
- weight_tickets_functional 10000
Setting queue level nice values
A user recently asked:
... I know it is possible to submit the jobs with nice in each submit sge script but is it possible to fix a nice value by queue ?
Reuti replied with the quick & simple procedure:
Use "qconf -mq
" to set the cluster queue "priority" parameter to the nice value you wish to use Use "qconf -msconf" to make sure "reprioritize_interval=0:0:0" Use "qconf -mconf" to make sure "reprioritize=0"
A quick test to verify the commands, setting priority=15 and sure enough the test.sh script was running with an altered nice level:
USER PID TIME UID PPID CPU NI COMMAND dag 6843 7:12PM 501 6679 0 31 - sge_shepherd-5 -bg dag 6844 7:12PM 501 6843 0 16 -sh /opt/sge/default/spool/chrisdag/job_scripts/5
Removing empty job output/error files automatically
In a thread dealing with some DRMAA issues, Reuti posted a quick little shell script that can be used as an epilog. Grid Engine supports "prolog" and "epilog" actions at the cluster queue level. These hooks are used to run scripts or perform an action before ('prolog') or after ('epilog') a job is run.
The shell script checks the Grid Engine standard output (STDOUT) and standard error (STDERR) output files and deletes any that are non-zero in size empty. This reduces clutter in job output directories while also preserving any STDOUT/STDERR files that actually contain information.
#!/bin/sh
## Delete the STDOUT and STDERR files (.o and .e) if they are empty
## ( we do not want to delete non-empty files, they may contain useful
## troubleshooting or debug information ... )
##
[ -r "$SGE_STDOUT_PATH" -a -f "$SGE_STDOUT_PATH" ] && [ ! -s "$SGE_STDOUT_PATH" ] && rm -f $SGE_STDO
UT_PATH
[ -r "$SGE_STDERR_PATH" -a -f "$SGE_STDERR_PATH" ] && [ ! -s "$SGE_STDERR_PATH" ] && rm -f $SGE_STDE
RR_PATHIn action ...
After saving this script and adding it to the epilog parameter of a cluster queue configuration, the $SGE_ROOT/examples/jobs/simple.sh script was run (all it does is print a datestamp to STDOUT before and after sleeping for 20 seconds) the following was observed:
While the job was running:
bioadmin@b7:~/test> ls -l
total 8
-rwxr-xr-x 1 bioadmin bioadmin 1529 2005-10-19 17:37 simple.sh
-rw-r--r-- 1 bioadmin bioadmin 0 2005-10-19 17:37 simple.sh.e2
-rw-r--r-- 1 bioadmin bioadmin 29 2005-10-19 17:37 simple.sh.o2
bioadmin@b7:~/test> qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
all.q@b7.training.bioteam.net BIP 1/4 0.01 lx24-x86
2 0.55500 simple.sh bioadmin r 10/19/2005 17:37:29 1
And after the job completes:
bioadmin@b7:~/test> ls -l
total 8
-rwxr-xr-x 1 bioadmin bioadmin 1529 2005-10-19 17:37 simple.sh
-rw-r--r-- 1 bioadmin bioadmin 58 2005-10-19 17:37 simple.sh.o2No muss, no fuss. The empty .e STDERR file was blown away automatically after the job completed. Any wiki-fidlers reading this post may want to add this code to the Snippets section of the wiki.

XML Feeds