<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>gridengine.info : Tag snippets, everything about snippets</title>
    <link>http://gridengine.info/tag/snippets.rss</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>tracking Grid Engine news, bugs, howtos and best practices</description>
    <item>
      <title>New user contributed accounting script</title>
      <description>&lt;p&gt;
A new "&lt;i&gt;pull statistics from the SGE accounting log file&lt;/i&gt;" script has been posted to the SGE community.  Olivier Blondel took &lt;a href="http://gridengine.info/articles/2006/10/11/simple-perl-reporting-tool-for-sge-accounting-data"&gt;Joe Landman's "usage.pl" script&lt;/a&gt; and modified it to suit his own needs.  The script can be found embedded inline with &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;amp;msgNo=19769"&gt;Olivier's post&lt;/a&gt; to the users mailing list. 
&lt;/p&gt;



</description>
      <pubDate>Tue, 15 May 2007 22:05:35 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8fdd2132-7c37-4424-b3c0-b4066d6f50f1</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2007/05/15/new-user-contributed-accounting-script#comments</comments>
      <category>MailList Bits</category>
      <category>Monitoring &amp; Reporting</category>
      <category>Snippets</category>
      <category>Accounting</category>
      <category>usage</category>
      <link>http://gridengine.info/2007/05/15/new-user-contributed-accounting-script</link>
    </item>
    <item>
      <title>Simple perl reporting tool for SGE accounting data</title>
      <description>
&lt;div&gt;
&lt;p&gt;
Joe at &lt;a href="http://scalableinformatics.com/metadot/index.pl"&gt;Scalable Informatics&lt;/a&gt; is &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;amp;msgNo=17663"&gt;offering&lt;/a&gt; up a "quick -n- simple" reporting script for Grid Engine accounting and usage data.  
&lt;/p&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;p&gt;
Usage examples:
&lt;/p&gt;
&lt;p&gt;&lt;pre&gt;
[landman@minicc ~]$ &lt;b&gt;./usage.pl&lt;/b&gt;
Total usage: (in units of second(s))
        wallclock  :       46733.000 second(s)
        user time  :        1600.000 second(s) [3.42%]
        system time:          17.000 second(s) [0.04%]
        cpu time   :       70379.000 second(s) [150.60%]

user            wallclock       user time       system time     cpu time
       memory          percent of total time
landman         46733.000       1600.000        17.000
70379.000       0.000           100.000

&lt;/pre&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;p&gt;
The script is available here &lt;a href="http://downloads.scalableinformatics.com/downloads/gridengine/usage.pl"&gt;http://downloads.scalableinformatics.com/downloads/gridengine/usage.pl&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;

</description>
      <pubDate>Wed, 11 Oct 2006 08:55:19 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:df8b2e1d-beaa-44da-a78d-63a0600b7f37</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/10/11/simple-perl-reporting-tool-for-sge-accounting-data#comments</comments>
      <category>Monitoring &amp; Reporting</category>
      <category>MailList Bits</category>
      <category>Accounting</category>
      <category>usage</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/10/11/simple-perl-reporting-tool-for-sge-accounting-data</link>
    </item>
    <item>
      <title>"job dropped because of user limitations"</title>
      <description>
&lt;p&gt;&lt;i&gt;Consider this snippet more search engine fodder for people web searching on particular error messages.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;Recently a user asked on the mailing list about encountering job submission rejection messages that say:&lt;/p&gt;

&lt;blockquote&gt;job dropped because of user limitation
&lt;/blockquote&gt;

&lt;p&gt;That particular rejection message is tied to 2 different configuration parameters that can be hard coded into grid engine:&lt;/p&gt;

&lt;b&gt;max_u_jobs&lt;/b&gt;
&lt;blockquote&gt;
       The  number  of  active (not finished) jobs which &lt;i&gt;&lt;b&gt;each Grid Engine user&lt;/b&gt;&lt;/i&gt;
       can have in the system simultaneously is controlled by this  parameter.
       A  value  greater  than  0 defines the limit. The default value 0 means
       "unlimited". If the max_u_jobs limit is exceeded by  a  job  submission
       then  the submission command exits with exit status 25 and an appropri-
       ate error message.
&lt;/blockquote&gt;

&lt;b&gt;max_jobs&lt;/b&gt;
&lt;blockquote&gt;
The number of active (not finished) jobs simultaneously allowed in Grid
       Engine is controlled by this parameter. A value greater than 0  defines
       the  limit.   The  default  value  0 means "unlimited". If the max_jobs
       limit is exceeded by a job submission then the submission command exits
       with exit status 25 and an appropriate error message.
&lt;/blockquote&gt;

&lt;p&gt;&lt;i&gt;Commentary&lt;/i&gt;: There are certainly use cases for which these parameters are the best solution but ...  before using either of them, consider if one of the SGE resource allocation policy mechanisms can accomplish the same goals. Hard coding global constraints on jobs can negatively affect flexibility and overall system utilization.&lt;/p&gt;




</description>
      <pubDate>Thu, 09 Mar 2006 14:03:58 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:032e14f1-cc0b-45ab-b37c-4052af3f1cca</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/03/09/job-dropped-because-of-user-limitations#comments</comments>
      <category>Administration</category>
      <category>Resource Allocation</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/03/09/job-dropped-because-of-user-limitations</link>
    </item>
    <item>
      <title>Reuti on Gaussian G03-D.01 Integration</title>
      <description>
&lt;p&gt;This is another one of those short blog posts that serve mostly as index fodder for search engines. Hopefully this will be a shortcut for someone searching on Gaussian SGE integration. The SGE mailing lists are generally not indexed well by the various net crawlers. &lt;/p&gt;

&lt;p&gt;Reuti has posted some comments and code snippets on Gaussian G03-D.01 Integration. The full message can be read here:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;amp;msgNo=14600"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;msgNo=14600&lt;/a&gt;&lt;/p&gt;


</description>
      <pubDate>Thu, 16 Feb 2006 17:00:41 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4127e7f5-1c2b-46b9-839a-5b5e75423a40</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/02/16/reuti-on-gaussian-g03-d-01-integration#comments</comments>
      <category>Application Integration</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/02/16/reuti-on-gaussian-g03-d-01-integration</link>
    </item>
    <item>
      <title>grouping jobs to nodes via wildcard PE's</title>
      <description>&lt;p&gt;
Grid Engine 6 introduced a better resource request syntax, including use of the wildcard "*" character. Some people on the SGE mailing list have reporting using wildcard selectors on Parallel Environments to enforce some really interesting grouping behavior within the grid engine job scheduler. In effect, one of the things this method allows one to do is control the hostgroups to which  parallel jobs of different sizes will be dispatched to. 
&lt;/p&gt;&lt;p&gt;
Take this &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=14772"&gt;mailing list question&lt;/a&gt; as an example...
&lt;/p&gt;&lt;div class="codePost"&gt;&lt;pre&gt;...We have a cluster composed of several "subclusters". Each subcluster has
8 nodes and is connected over a first switch to the master switch.


        subcluster 1                         subcluster 2         ...
n11 n12 n13 n14 n15 n16 n17 n18      n21 n22 n23 n24 n25 n26 n27 n28
 |   |   |   |   |   |   |   |        |   |   |   |   |   |   |   |
 |   |   |   |   |   |   |   |        |   |   |   |   |   |   |   |
-------------------------------      -------------------------------
        switch 1                             switch 2
-------------------------------      -------------------------------
           |                                    |
           |                                    |
          ----------------------------------------
                        master switch
          ----------------------------------------
                              |
                              |
                       -------------
                        master node
                       -------------

One of the applications running on the cluster needs 8 nodes. We want to
configure the queue (queues?) to allocate only a full subcluster to a
job and not to spawn over to another subcluster.&lt;/pre&gt;&lt;/div&gt;&lt;br/&gt;&lt;p&gt;
Reuti provides a really slick solution ...
&lt;/p&gt;&lt;p&gt;&lt;ol&gt;&lt;li&gt;Create a hostgroup for each subcluster&lt;/li&gt;&lt;li&gt;Create a PE for each subcluster ('mpi_a' and 'mpi_b')&lt;/li&gt;&lt;li&gt;Create 2 queues, each associated with a subcluster hostgroup and one of the newly create PE environments&lt;/li&gt;&lt;li&gt;Submit jobs via: 'qsub -pe "mpi* 8"'&lt;/li&gt;&lt;/ol&gt;&lt;/p&gt;&lt;p&gt;
The end result is that parallel jobs will only land within one particular subcluster, keeping all network communication within a single switch (presumably the reason for the subcluster grouping in the first place). 
&lt;/p&gt;&lt;p&gt;
Reuti goes on to explain how this can be used for grouping non-parallel jobs -- some reconfiguration of the queue sorting mechanism and sequence numbers will allow  one subcluster be "filled" with serial jobs before job slots are used from the &lt;i&gt;other&lt;/i&gt; subcluster (a wise move since this keeps the 2nd subcluster free for larger parallel jobs). 
&lt;/p&gt;

</description>
      <pubDate>Tue, 14 Feb 2006 21:06:08 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:8a6568ca-f298-4fb3-a568-259a85494e44</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes#comments</comments>
      <category>Administration</category>
      <category>MailList Bits</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes</link>
    </item>
    <item>
      <title>sorting qstat output</title>
      <description>&lt;p&gt;A user recently asked the mailing list for suggestions on sorting the full output of qstat by job start time.&lt;/p&gt;&lt;p&gt;Reuti &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=14723"&gt;replied back&lt;/a&gt; with a link to his &lt;a href="http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&amp;showInfo=true"&gt;most excellent script&lt;/a&gt;, a bash script called  "status" that makes heavy use of awk under the hood. The script works with both SGE 5.3 and 6.x versions of qstat. &lt;/p&gt;&lt;p&gt;The script is hosted on the download section of the SGE project website:&lt;br/&gt;&lt;a href="http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&amp;showInfo=true"&gt;http://gridengine.sunsource.net/servlets/ProjectDocumentView?documentID=8&amp;showInfo=true&lt;/a&gt;&lt;/p&gt;&lt;p&gt;After downloading the script, usage is trivial. To sort output by job start time one would do:&lt;br/&gt;

&lt;div class="codePost"&gt;&lt;pre&gt; ./status -s time -a
Running jobs:
job-ID  # name                      owner      start time          running in
-----------------------------------------------------------------------------
   561  1 Job7458                   www        01/08/2006 18:59:05 all.q      (stalled)
   653  1 A11510113941883           www        02/08/2006 09:13:58 all.q      
   657  1 A11541113941889           www        02/08/2006 09:14:54 all.q      

Waiting jobs:
job-ID  # name                      owner      submit time        
------------------------------------------------------------------
   562  1 Job7458.cleanup           www        01/08/2006 17:38:14 (hold)
   654  1 btpymol                   www        02/08/2006 09:13:59 (Error)
   654  1 btpymol                   www        02/08/2006 09:13:59 (Error)
   655  1 merge                     www        02/08/2006 09:13:59 (hold)
   656  1 cleanup                   www        02/08/2006 09:13:59 (hold)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   658  1 btrasmol                  www        02/08/2006 09:14:55 (Error)
   659  1 merge                     www        02/08/2006 09:14:55 (hold)
   660  1 cleanup                   www        02/08/2006 09:14:55 (hold)
   407  1 impossibleJob             www        11/28/2005 09:58:42 
&lt;/pre&gt;&lt;/div&gt;&lt;/p&gt;</description>
      <pubDate>Sun, 12 Feb 2006 16:07:42 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:b3f73950-67b3-40a4-b89c-1eff8545a962</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/02/12/sorting-qstat-output#comments</comments>
      <category>External Tools &amp; Apps</category>
      <category>MailList Bits</category>
      <category>qstat</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/02/12/sorting-qstat-output</link>
    </item>
    <item>
      <title>Easy setup of equal user fairshare policy</title>
      <description>&lt;p&gt;
Reuti posted this link again on the users list and it really caught my eye. It really does represent the fastest/easiest way for an SGE admin to set up a basic resource allocation policy that shares resources equally (and automatically) among all users. 
&lt;/p&gt;
&lt;p&gt;
The link:&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;amp;msgNo=8319"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;msgNo=8319&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
It boils down to 2 simple configuration actions:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make 2 changes in the main SGE configuration ('qconf -mconf'):
&lt;ul&gt;&lt;li&gt;enforce_user auto&lt;/li&gt;&lt;li&gt;auto_user_fshare 100&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;br/&gt;
&lt;li&gt;Make 1 change in the SGE scheduler configuration ('qconf -msconf'):
&lt;ul&gt;&lt;li&gt;weight_tickets_functional  10000&lt;/li&gt;&lt;/ul&gt;
&lt;/ol&gt;



</description>
      <pubDate>Tue, 17 Jan 2006 10:20:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:eb2ca253-7692-4909-94cc-e5293504fbea</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-policy#comments</comments>
      <category>Administration</category>
      <category>Resource Allocation</category>
      <category>fairshare</category>
      <category>fair</category>
      <category>share</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-policy</link>
    </item>
    <item>
      <title>Setting queue level nice values</title>
      <description>&lt;p&gt;A user recently asked:&lt;/p&gt;&lt;blockquote&gt;&lt;i&gt; ... I know it is possible to submit the jobs with nice in each submit
sge script but is it possible to fix a nice value by queue ?
&lt;/i&gt;&lt;/blockquote&gt;&lt;p&gt;Reuti &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=13445"&gt;replied&lt;/a&gt; with the quick &amp; simple procedure:&lt;/p&gt;&lt;blockquote&gt;&lt;ol&gt;&lt;li/&gt;Use "&lt;span class="code"&gt;qconf -mq &lt;queueName&gt;&lt;/span&gt;" to set the cluster queue "&lt;span class="code"&gt;priority&lt;/span&gt;" parameter to the nice value you wish to use
&lt;li/&gt;Use "&lt;span class="code"&gt;qconf -msconf&lt;/span&gt;" to make sure "&lt;span class="code"&gt;reprioritize_interval=0:0:0&lt;/span&gt;"
&lt;li/&gt;Use "&lt;span class="code"&gt;qconf -mconf&lt;/span&gt;" to make sure "&lt;span class="code"&gt;reprioritize=0&lt;/span&gt;"
&lt;/ol&gt;&lt;/blockquote&gt;&lt;p&gt;A quick test to verify the commands, setting&lt;span class="code"&gt; priority=1&lt;/span&gt;5 and sure enough the test.sh script was running with an altered nice level:
&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;
USER  PID   TIME    UID  PPID CPU   NI  COMMAND
dag   6843  7:12PM  501  6679  0    31  - sge_shepherd-5 -bg
dag   6844  7:12PM  501  6843  0    &lt;b&gt;16&lt;/b&gt;  -sh /opt/sge/default/spool/chrisdag/job_scripts/5
&lt;/pre&gt;&lt;/blockquote&gt;

</description>
      <pubDate>Mon, 31 Oct 2005 19:45:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4b3c4640-5961-41f1-b5a2-070b36ab9f2a</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2005/10/31/setting-cluster-queue-level-nice-values#comments</comments>
      <category>MailList Bits</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2005/10/31/setting-cluster-queue-level-nice-values</link>
    </item>
    <item>
      <title>Removing empty job output/error files automatically</title>
      <description>&lt;p&gt;In a thread dealing with some DRMAA issues, &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=13380"&gt;Reuti posted&lt;/a&gt; a quick little shell script that can be used as an epilog. Grid Engine  supports "prolog" and "epilog" actions at the cluster queue level. These hooks are used to run scripts or perform an action before ('prolog') or after ('epilog') a job is run. &lt;/p&gt;&lt;p&gt;The shell script checks the Grid Engine standard output (STDOUT) and standard error (STDERR) output files and deletes any that are &lt;s&gt;non-zero in size&lt;/s&gt; empty. This reduces clutter in job output directories while also preserving any STDOUT/STDERR files that actually contain information. &lt;/p&gt;&lt;pre&gt;&lt;span class="code"&gt;
#!/bin/sh

## Delete the STDOUT and STDERR files (.o and .e) if they are empty
##  ( we do not want to delete non-empty files, they may contain useful
##    troubleshooting or debug information ... )
##

[ -r "$SGE_STDOUT_PATH" -a -f "$SGE_STDOUT_PATH" ] &amp;&amp; [ ! -s "$SGE_STDOUT_PATH" ] &amp;&amp; rm -f $SGE_STDO
UT_PATH
[ -r "$SGE_STDERR_PATH" -a -f "$SGE_STDERR_PATH" ] &amp;&amp; [ ! -s "$SGE_STDERR_PATH" ] &amp;&amp; rm -f $SGE_STDE
RR_PATH&lt;/pre&gt;&lt;/span&gt;&lt;br/&gt;

&lt;p&gt;In action ...

&lt;/p&gt;&lt;p&gt;After saving this script and adding it to the &lt;span class="code"&gt;epilog&lt;/span&gt; parameter of a cluster queue configuration, the &lt;span class="code"&gt;$SGE_ROOT/examples/jobs/simple.sh&lt;/span&gt; script was run (all it does is print a datestamp to STDOUT before and after sleeping for 20 seconds) the following was observed:&lt;/p&gt;&lt;p&gt;While the job was running:&lt;/p&gt;&lt;pre&gt;&lt;span class="code"&gt;bioadmin@b7:~/test&gt; ls -l
total 8
-rwxr-xr-x  1 bioadmin bioadmin 1529 2005-10-19 17:37 simple.sh
-rw-r--r--  1 bioadmin bioadmin    0 2005-10-19 17:37 simple.sh.e2
-rw-r--r--  1 bioadmin bioadmin   29 2005-10-19 17:37 simple.sh.o2
bioadmin@b7:~/test&gt; qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@b7.training.bioteam.net  BIP   1/4       0.01     lx24-x86      
      2 0.55500 simple.sh  bioadmin     r     10/19/2005 17:37:29     1&lt;/span&gt;&lt;/pre&gt;
And after the job completes:

&lt;pre&gt;&lt;span class="code"&gt;bioadmin@b7:~/test&gt; ls -l
total 8
-rwxr-xr-x  1 bioadmin bioadmin 1529 2005-10-19 17:37 simple.sh
-rw-r--r--  1 bioadmin bioadmin   58 2005-10-19 17:37 simple.sh.o2&lt;/span&gt;&lt;/pre&gt;&lt;p&gt;No muss, no fuss. The empty .e STDERR file was blown away automatically after the job completed. Any wiki-fidlers reading this post may want to add this code to the &lt;a href="http://gridengine.info/wiki/index.php/Snippets"&gt;Snippets section&lt;/a&gt; of the wiki. 
&lt;/p&gt;</description>
      <pubDate>Wed, 19 Oct 2005 18:03:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:885eee91-c928-4a47-9d03-5421f8489219</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/2005/10/19/removing-empty-job-output-error-files-automatically#comments</comments>
      <category>MailList Bits</category>
      <category>epilog</category>
      <category>Snippets</category>
      <link>http://gridengine.info/2005/10/19/removing-empty-job-output-error-files-automatically</link>
    </item>
  </channel>
</rss>
