<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>gridengine.info : </title>
    <link>http://gridengine.info/articles.rss</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>tracking Grid Engine news, bugs, howtos and best practices</description>
    <item>
      <title>Keeping single slot jobs off of certain nodes</title>
      <description>&lt;p&gt;
In &lt;a href="http://gridengine.sunsource.net/servlets/BrowseList?list=users&amp;by=thread&amp;from=40145"&gt;this thread&lt;/a&gt;, Paul asks:&lt;br/&gt;
&lt;blockquote&gt;
&lt;em&gt;"I'm looking at finding a way to either limit single-slot jobs, or requiring all jobs in a given queue to be running in a pe.  Specifically, I have some SMP nodes, that I'd rather not waste on single thread, and also keep the single thread jobs off of the infiniband connected nodes.  I have gigE small cpu count nodes for this task."&lt;/em&gt;
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
Dan &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=24404"&gt;replied&lt;/a&gt; with another example of clever use of the new SGE Resource Quota syntax within SGE 6.1 and later:&lt;br/&gt;
&lt;blockquote&gt;
&lt;em&gt;You can use resource quota sets to restrict non-PE jobs to certain queues hosts.&lt;/em&gt;
&lt;pre&gt;
limit pes !* hosts @smp to slots=0
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;Slick!&lt;/p&gt;



</description>
      <pubDate>Thu, 08 May 2008 16:01:07 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:b3436600-fc60-4ace-8bc4-c55b10b04389</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/05/08/keeping-single-slot-jobs-off-of-certain-nodes#comments</comments>
      <category>MailList Bits</category>
      <category>RQS</category>
      <link>http://gridengine.info/articles/2008/05/08/keeping-single-slot-jobs-off-of-certain-nodes</link>
    </item>
    <item>
      <title>Think I'm going to like the new Sun wiki</title>
      <description>&lt;p&gt;One of the more interesting things (to me at least!) in the recent news about the SGE 6.2 beta was the word that all documentation and manuals would be moving to a new home at &lt;a href="http://wikis.sun.com"&gt;http://wikis.sun.com&lt;/a&gt;. 
&lt;/p&gt;
&lt;p&gt;
I registered a user account a few days ago and hit the site today to see if any SGE stuff had made it over. The screenshot below is what I found. It's nice to see smart tech people have a sense of humor. The downtime must have been short as when I I refreshed the browser the site was back to normal.
&lt;/p&gt;
&lt;p&gt;
&lt;img src="http://gridengine.info/files/sun-wiki-1.png"/&gt;
&lt;/p&gt;


</description>
      <pubDate>Thu, 08 May 2008 15:34:56 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:0571c7b8-26c9-4c15-ad20-1781ffb3cccc</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/05/08/think-im-going-to-like-the-new-sun-wiki#comments</comments>
      <link>http://gridengine.info/articles/2008/05/08/think-im-going-to-like-the-new-sun-wiki</link>
    </item>
    <item>
      <title>SGE 6.2 goes beta next week (your help needed)</title>
      <description>&lt;p&gt;SGE 6.2 is being released in Beta form next week and the developers are asking for people to make some time if possible to fully test out the beta snapshot of the latest major SGE point release.
&lt;/p&gt;
&lt;p&gt;
Andy's full note can be found here (well worth reading in full ...):&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=24426"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=24426
&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
In my mind, I'm most excited about the following:&lt;br/&gt;
&lt;ul&gt;
&lt;li&gt;Advance Reservations &amp; array job inter-dependencies&lt;/li&gt;
&lt;li&gt;The scheduler is now a thread within the qmaster!&lt;/li&gt;
&lt;li&gt;The JVM running within the qmaster&lt;/li&gt;
&lt;li&gt;SGE moving all docs into wiki form!&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;


</description>
      <pubDate>Mon, 05 May 2008 10:00:43 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9302cac0-b713-4bd9-be86-2e22ec0b8234</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/05/05/sge-6-2-goes-beta-next-week-your-help-needed#comments</comments>
      <category>News</category>
      <category>MailList Bits</category>
      <category>6_2</category>
      <link>http://gridengine.info/articles/2008/05/05/sge-6-2-goes-beta-next-week-your-help-needed</link>
    </item>
    <item>
      <title>SGE testbeds: Simulate mass numbers of exec hosts</title>
      <description>&lt;p&gt;Interesting message on the developers list recently as a comment attached to &lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2364"&gt;Issue 2364&lt;/a&gt;. Within, Andreas explains the use of &lt;code&gt;SIMULATE_EXECDS=true&lt;/code&gt; parameter that allows unrestricted execution host creation (via suppressing unknown host errors). &lt;/p&gt;
&lt;p&gt;I can see this as being very useful for testing SGE scheduler and policy configuration settings before implementing them on production systems.&lt;/p&gt;
&lt;p&gt;
From the &lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2364"&gt;comment&lt;/a&gt;:&lt;br/&gt;
&lt;pre&gt;This is a short HOWTO for the use of the cluster simulator:

(1) Start with installing a new SGE cluster as used, but
install not more than the qmaster itself

(2) After successful installation use qconf -mconf to set

&amp;#160;&amp;#160;&amp;#160;&amp;#160;SIMULATE_EXECDS=true

in qmaster_params section of sge_conf(5). This causes the
suppression of the 'unknown' queue states.

(3) Make sure the "all.q" and any other queue that you
configure does not use any 'load_threasholds'. Cluster
simulator has no means to anyhow emulate load values. As a
result there will be no load values. For that reason
load_threasholds may not be used as it would cause load
alarm queue states that prevent scheduler from dispatching
jobs into your queues.

(4) Use qconf -ae|-Ae to create arbitrary number of
simulated execution hosts. The hosts needs not exist as
qmaster anyways won't try to send anything to it, but the
hostname must be resolvable.

Optionally:

(5) If you care for scheduler runtimes set

&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;PROFILE=true

in the params section of sched_conf(5) using qconf -msconf.

Now your simulated cluster is ready. You can send in
arbitrary numbers of jobs. Due to (2) and (3) scheduler will
dispatch them and send corresponding orders to qmaster.
Qmaster will behave as if it would start the jobs, but it
raise timers to ensure job state transitions are passed as
used. What won't work is interactive jobs (i.e. qrsh, qsh
etc.) and parallel jobs with control_slaves set to true in
sge_pe(5). Jobs' runtime can be controled via the first job
argument. That means when

# qsub -b y /bin/sleep 5

is submitted, the job will finish after five seconds.
&lt;/pre&gt;
&lt;/p&gt;



</description>
      <pubDate>Fri, 02 May 2008 08:42:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:5061261b-ca71-44f5-97ba-20e2bbc976f7</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/05/02/sge-testbeds-simulate-mass-numbers-of-exec-hosts#comments</comments>
      <category>Administration</category>
      <category>MailList Bits</category>
      <category>simulation</category>
      <category>SIMULATE_EXECDS</category>
      <link>http://gridengine.info/articles/2008/05/02/sge-testbeds-simulate-mass-numbers-of-exec-hosts</link>
    </item>
    <item>
      <title>RHEL5.2/Centos5 kernel update may cause problems</title>
      <description>&lt;p&gt;This is a heads up for RedHat Enterprise Linux (RHEL) users as well as for users (like myself) of the various Centos variants.&lt;/p&gt;
&lt;p&gt;There is a recent patch for RHEL that changes the inode data structure exposed to NFS clients from 32 bits to 64 bits in size. The basic summary of this issue is that many applications may not handle this change gracefully (such as one report with the SGE linux binaries.)
&lt;/p&gt;
&lt;p&gt;RHEL and modern Centos users should probably pay attention to (by subscribing as CC: contacts) to this issue:&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2543"&gt;http://gridengine.sunsource.net/issues/show_bug.cgi?id=2543
&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A RedHat bug report discussing the issue in more detail is here:&lt;br/&gt;
"Large inode number patch breaks applications"&lt;br/&gt;
&lt;a href="https://bugzilla.redhat.com/show_bug.cgi?id=241348"&gt;https://bugzilla.redhat.com/show_bug.cgi?id=241348
&lt;/a&gt;
&lt;/p&gt;

</description>
      <pubDate>Mon, 21 Apr 2008 12:20:25 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9780427a-3745-401a-a17d-ea9909219c6e</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/04/21/rhel5-2-centos5-kernel-update-may-cause-problems#comments</comments>
      <category>News</category>
      <category>MailList Bits</category>
      <category>rhel</category>
      <category>centos</category>
      <link>http://gridengine.info/articles/2008/04/21/rhel5-2-centos5-kernel-update-may-cause-problems</link>
    </item>
    <item>
      <title>mpiblast, SGE and MPICH2 integration</title>
      <description>&lt;p&gt;
Matthias Neder has posted a quick summary of a tightly integrated MPICH2 integration that can successfully handle his &lt;a href="http://www.mpiblast.org/"&gt;mpiblast&lt;/a&gt; application integration.&lt;/p&gt;
&lt;p&gt;
The summarized solution can be found here:&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;msgNo=24204"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&amp;msgNo=24204
&lt;/a&gt;&lt;/p&gt;



</description>
      <pubDate>Mon, 21 Apr 2008 11:49:28 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:072a9c14-b16e-46a9-9b19-32a0de541541</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/04/21/mpiblast-sge-and-mpich2-integration#comments</comments>
      <category>Application Integration</category>
      <category>mpich2</category>
      <category>mpiblast</category>
      <link>http://gridengine.info/articles/2008/04/21/mpiblast-sge-and-mpich2-integration</link>
    </item>
    <item>
      <title>6.2 Bringing Significant Improvements to Cluster Queue Matching</title>
      <description>&lt;p&gt;
&lt;em&gt;Interesting writeup from Andreas Hass reproduced in full below ... 
&lt;/em&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;
I thought this could be of interest for those who care for dispatching times. This maintrunk check-in&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&amp;msgNo=9814"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&amp;msgNo=9814&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
will improve the matching times for set-ups where queue resource limits such as -l h_rt or -l h_vmem are criterion whether a job gets into a queue or not.
&lt;/p&gt;
&lt;p&gt;
Before the above change we had an exponential growth of dispatching times
&lt;/p&gt;
&lt;pre&gt;
04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: job dispatching took 0.030 s (20 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:48:23|schedu|es-ergb01-01|P|PROF: job dispatching took 0.130 s (40 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:48:50|schedu|es-ergb01-01|P|PROF: job dispatching took 0.630 s (80 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:49:26|schedu|es-ergb01-01|P|PROF: job dispatching took 3.210 s (160 fast, 0 comp, 0 pe, 0 res)
&lt;/pre&gt;
&lt;p&gt;
now growth is linear
&lt;/p&gt;
&lt;pre&gt;
04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: job dispatching took 0.000 s (20 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:54:17|schedu|es-ergb01-01|P|PROF: job dispatching took 0.020 s (40 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:54:44|schedu|es-ergb01-01|P|PROF: job dispatching took 0.050 s (80 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:55:16|schedu|es-ergb01-01|P|PROF: job dispatching took 0.070 s (160 fast, 0 comp, 0 pe, 0 res)
&lt;/pre&gt;
&lt;p&gt;
also note this maintrunk check-in
&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&amp;msgNo=9713"&gt;http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&amp;msgNo=9713&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
when profiling is enabled in sched_conf(5) like this
&lt;/p&gt;
&lt;pre&gt;
   :
  params  PROFILE=true
   :
&lt;/pre&gt;
&lt;p&gt;
the actual cause for exponential/linear dispatching times becomes fairly obvious: Without the above improvement the scheduler did check each single queue instance also in cases when the entire cluster queue was not suited
&lt;/p&gt;



&lt;pre&gt;
04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: sequential matching global          rqs     cqstatic      hstatic      qstatic     hdynamic qdyn
04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: sequential matching 20            0           30          200          210          210 65
04/15/2008 11:48:23|schedu|es-ergb01-01|P|PROF: sequential matching 40            0           60          800          820          820 230
04/15/2008 11:48:50|schedu|es-ergb01-01|P|PROF: sequential matching 80            0          120         3200         3240         3240 860
04/15/2008 11:49:26|schedu|es-ergb01-01|P|PROF: sequential matching 160            0          240        12800        12880        12880 3320
&lt;/pre&gt;
&lt;p&gt;
now the queue instances checked is done only if needed
&lt;/p&gt;
&lt;pre&gt;
04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: sequential matching global          rqs     cqstatic      hstatic      qstatic     hdynamic qdyn
04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: sequential matching 20            0           30           10           10           10 10
04/15/2008 11:54:17|schedu|es-ergb01-01|P|PROF: sequential matching 40            0           60           20           20           20 20
04/15/2008 11:54:44|schedu|es-ergb01-01|P|PROF: sequential matching 80            0          120           40           40           40 40
04/15/2008 11:55:16|schedu|es-ergb01-01|P|PROF: sequential matching 160            0          240           80           80           80 80
&lt;/pre&gt;
&lt;p&gt;
the setup I used to get these numbers was two cluster queues over 10,20,40,80 simulated hosts (using SIMULATE_EXECDS=true in sge_sconf(5) qmaster_params).
&lt;/p&gt;
&lt;p&gt;
One cluster queue was not eligible at all for any of the jobs due to a queue resource limit, but I forced scheduler to evaluate this cluster queue first with sequence numbers as to provoke the worst case. In addition each job requested slightly different resource amount
&lt;/p&gt;
&lt;pre&gt;
  -l h_rt=300
  -l h_rt=301
  -l h_rt=302
     :
  -l h_rt=379
&lt;/pre&gt;
&lt;p&gt;
as to sabotage the reuse of dispatching results for identical jobs within a scheduling interval.
&lt;/p&gt;
&lt;p&gt;
Regards,
Andreas
&lt;/p&gt;</description>
      <pubDate>Wed, 16 Apr 2008 07:21:46 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:24b283f8-930b-4ac9-bac4-cc5746f0e111</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/04/16/6-2-bringing-significant-improvements-to-cluster-queue-matching#comments</comments>
      <link>http://gridengine.info/articles/2008/04/16/6-2-bringing-significant-improvements-to-cluster-queue-matching</link>
    </item>
    <item>
      <title>6.1 leak found;  schedd_job_info is not your friend</title>
      <description>&lt;p&gt;Anyone interested in the memory leak that has been bothering some 6.1 users should check out the comments associated with Issue #2464:&lt;br /&gt;
&lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2464"&gt;http://gridengine.sunsource.net/issues/show_bug.cgi?id=2464 &lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Among the interesting things you'll see are:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;A great example of motivated SGE users and developers working together to track down a hard to find problem&lt;/li&gt;
    &lt;li&gt;Interesting comments on the potential &amp;quot;&lt;em&gt;unfixible&lt;/em&gt;&amp;quot; (my words) nature of the schedd_job_info messages&lt;/li&gt;
    &lt;li&gt;A really cool workaround for getting job scheduler messages with schedd_job_info=FALSE&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a nutshell, there is a problem in the schedd_job_info framework that can cause massive resource utilization on the qmaster machine. This happens in particular on larger systems or places with large numbers of queue instances. This can also pop up on systems with jobs that are pending due to un-fulfillable resource requests. This explains why I saw the memory leak on my small testbed cluster -- I have a number of &amp;quot;pend forever&amp;quot; jobs in the queue for demonstration purposes.&lt;/p&gt;
&lt;p&gt;The fix is to disable schedd_job_info. This is potentially problematic though as that feature is pretty much my goto-first action for troubleshooting job dispatch problems.&lt;/p&gt;
&lt;p&gt;However, in a recent update comment to this issue, andreas added a possible tip for getting scheduling messages about a job in a way that that puts far less load on the system AND does not require schedd_job_info=TRUE:&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;
qalter -w v &lt;jobid&gt; &lt;/jobid&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;Remember though that comments found in a bug report are not &amp;quot;gospel&amp;quot; so don't read this as news that schedd_job_info is forever broken or going away. Expect to see this and other issues discussed as part of the SGE Roadmap. You are attending the &lt;a href="http://www.opensourcegridcluster.org/"&gt;May 2008 SGE Workshop&lt;/a&gt;, right?&lt;/p&gt;

</description>
      <pubDate>Thu, 10 Apr 2008 11:07:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8b0bf596-9b09-477a-983e-98e6d0af517b</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/04/10/6-1-leak-found-schedd_job_info-is-not-your-friend#comments</comments>
      <category>News</category>
      <category>Administration</category>
      <link>http://gridengine.info/articles/2008/04/10/6-1-leak-found-schedd_job_info-is-not-your-friend</link>
    </item>
    <item>
      <title>Release 6.1u4 is out</title>
      <description>&lt;p&gt;Congratulations to the SGE developer team!&lt;/p&gt;
&lt;p&gt;
Big news today -- 6.1u4 was just announced; hopefully addressing some persistent issues people have been having with the previous releases. The plaintext list of fixed issues can be found here:&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/project/gridengine/61patches.txt"&gt;http://gridengine.sunsource.net/project/gridengine/61patches.txt&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
The full announcement is here:&lt;br/&gt;
&lt;a href="http://gridengine.sunsource.net/news/GE61u4-announce.html"&gt;http://gridengine.sunsource.net/news/GE61u4-announce.html&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I've been unable to keep 6.1u3 running consistently on a small test system, probably due to the same memory leak others have been reporting. There is a chance that a subtle leak still exists or at least has not been fully tracked down in 6.1u4 but multiple people are working diligently on this. Best bet is to monitor the &lt;a href="http://gridengine.sunsource.net/servlets/SummarizeList?listName=users"&gt;users mailing list&lt;/a&gt; to see the feedback.  
&lt;/p&gt;

</description>
      <pubDate>Fri, 04 Apr 2008 10:06:11 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:bf528606-99ec-4e39-a94b-772bde90cee8</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/04/04/release-6-1u4-is-out#comments</comments>
      <category>News</category>
      <link>http://gridengine.info/articles/2008/04/04/release-6-1u4-is-out</link>
    </item>
    <item>
      <title>Summer 2008 SGE Training Workshops</title>
      <description>&lt;p&gt;Hi folks. It's done. I've made a personal and financial commitment to organize a regularly occurring series of Grid Engine Training Workshops starting initially in the Cambridge, Massachusetts area. This is a darwinian test to see if my thoughts about the size and needs of the Grid Engine community are true. With existing SGE training opportunities only scheduled 1-2 times per year it is still an open guess as to the size of the potential audience for these sorts of events.&lt;/p&gt;
&lt;p&gt;Course details and updates will always be here:&lt;br /&gt;
&lt;a href="http://blog.bioteam.net/category/training/"&gt;http://blog.bioteam.net/category/training/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Download the brochure here:&lt;br /&gt;
&lt;a href="http://blog.bioteam.net/wp-content/uploads/2008/03/Training-SGE-Brochure.pdf"&gt;&lt;img width="150" border="0" alt="SGE Training Brochure" src="http://blog.bioteam.net/wp-content/uploads/2008/03/sgetrain-thumnail.png" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Obviously my employer has a bit of a commercial/profit motive here but I've been the one pushing to make this happen. Consider this a bit of market research to see if BioTeam should invest in growing the number of staff capable of providing SGE related training, professional services and support to the community at large.&lt;/p&gt;
&lt;p&gt;I welcome your comments and feedback and would appreciate any assistance in spreading the word about these events.&lt;/p&gt;
&lt;p&gt;Thanks! -- &lt;a href="mailto:chris@bioteam.net"&gt;Chris&lt;/a&gt;.&lt;/p&gt;

</description>
      <pubDate>Sat, 22 Mar 2008 16:27:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:9a410fcb-dcc8-4080-8cfe-ba88a6d452f9</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/22/summer-2008-sge-training-workshops#comments</comments>
      <category>Training</category>
      <link>http://gridengine.info/articles/2008/03/22/summer-2008-sge-training-workshops</link>
    </item>
    <item>
      <title>CFP: Open Source Grid &amp; Cluster Conference 2008 </title>
      <description>&lt;p&gt;Reminder: Call for Participation closes Friday, March 21
&lt;/p&gt;
&lt;p&gt;OPEN SOURCE GRID &amp; CLUSTER CONFERENCE 2008
&lt;/p&gt;
&lt;p&gt;Featuring: GlobusWorld, Grid Engine Workshop, Rocks Cluster Workshop
&lt;/p&gt;
&lt;p&gt;May 13 - 15, 2008 in Oakland, California&lt;br/&gt;
&lt;a href="http://www.OpenSourceGridCluster.org"&gt;http://www.OpenSourceGridCluster.org&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;DEADLINE FOR ABSTRACT SUBMISSIONS:  March 21, 2008
&lt;/p&gt;
&lt;p&gt;Whether you are a Grid or Cluster expert with technical advice to
share, or a leader with visions for the future of open source Grid and
Cluster computing in research or industry, the Open Source Grid &amp;
Cluster Conference is the premier event for delivering your message to
the Grid and Cluster community.  In past years, hundreds of Grid and
Cluster professionals from research and industry have attended
individual events such as GlobusWorld, the Grid Engine Workshop, and
Rocks-a-Palooza to discuss Grid and Cluster adoption issues, to
receive training and exchange information related to these widely used
Grid and Cluster software systems. This year the Globus, Grid Engine,
and Rocks communities are joining forces to create the most
comprehensive event on open source Grid and Cluster computing to date.
&lt;/p&gt;
&lt;p&gt;The Open Source Grid &amp; Cluster Conference program will offer a wide
variety of conference sessions, mini-symposiums, panel discussions,
workshops, and tutorials. Speaking opportunities range from highly
technical research, development, and deployment presentations to
targeted panels on commercial and research adoption
considerations. The Open Source Grid &amp; Cluster Conference will run
parallel tracks, some focused on Globus, Grid Engine, and Rocks
community-specific topics, and others focused on cross-cutting and
other open source Grid and Cluster software technologies and uses.
&lt;/p&gt;

&lt;p&gt;KEY DATES AND DEADLINES&lt;br/&gt;
Abstract submission deadline - March 21, 2008&lt;br/&gt;
Acceptance notification - April 15, 2008&lt;br/&gt;
Presentation Slides Due - April 30, 2008&lt;br/&gt;

SPEAKING TOPICS&lt;br/&gt;
Submissions should be centered on the theme of uses and implementation
of Open Source Software for Grid and Cluster Computing.&lt;/p&gt;

&lt;p&gt;All proposals should be submitted online at
  &lt;a href="http://www.OpenSourceGridCluster.org/CFP.html"&gt;http://www.OpenSourceGridCluster.org/CFP.html&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Click on through for the submission guidelines ...&lt;br/&gt;&lt;/p&gt;



&lt;p&gt;Questions should be sent to program@OpenSourceGridCluster.org
&lt;/p&gt;
&lt;p&gt;SUBMISSION GUIDELINES
---------------------
&lt;/p&gt;
&lt;p&gt;ABSTRACT GUIDELINES&lt;br/&gt;
All submissions must include an abstract of no more than 500 words,
and a brief bio for each presenter. Abstracts should be written so as
to be self-contained and to provide the technical substance required
for the program committee to evaluate the session's contribution to
the Open Source Grid and Cluster community. Please indicate whether
the proposed session is specific to just one of Globus, Grid Engine,
or Rocks. If the presentation was given at another conference, then
the name, date, and location of the event must be noted in the
submission. Abstracts should be submitted in plain text format either
as an attachment or in the main body of the e-mail. Abstracts and bios
for accepted submissions will be published on the Open Source Grid &amp;
Cluster Conference website and in other conference material as the
description of the session. Presentation slides may be published on
the Conference website and distributed with conference material.
&lt;/p&gt;
&lt;p&gt;PRESENTATIONS&lt;br/&gt;
Presentation proposals may be submitted for individual time slots of
thirty minutes. Please be sure to allow ten minutes for Q&amp;A within
this allotted time. Individual presentations will be grouped with
similar topic presentations to fill an entire session.
&lt;/p&gt;
&lt;p&gt;BUILD YOUR OWN SESSION&lt;br/&gt;
Participants are invited to organize their own, complete,
ninety-minute session, including but not limited to the following
categories. The submission must include an agenda, and the names and
associations of all participants.
&lt;/p&gt;
&lt;p&gt;Panel Session / Mini-Symposium: These sessions will enable conference
attendees to learn from a group of experts on a particular topic. The
session organizer may deliver an opening talk to set the context for
the remainder of the session. Panelists will then give presentations
designed to stimulate audience participation, on their preferably
diverse opinions, experiences or expertise regarding the theme of the
session. At least ten minutes should be reserved at the end for
questions from the audience.
&lt;/p&gt;
&lt;p&gt;Birds-of-a-Feather (BOF) Sessions: These sessions will allow
conference attendees to discuss focused subject areas. The session may
include presentations and open discussion. Session organizers will be
responsible for moderating these sessions and reporting on their
outcomes.
&lt;/p&gt;
&lt;p&gt;WORKSHOPS AND TUTORIALS&lt;br/&gt;
Ample room is available for half-day and full-day pre-conference
(Monday) and post-conference (Friday) workshops and
tutorials. Workshops may include topical meetings with open
registration or community/group meetings with resricted attendance.
Tutorials may be on any topic related to the Open Source Grid and
Cluster theme of the conference. Submissions must include preferred
and minimum acceptable room size, and preferred and acceptable
times. An extra nominal fee may be required of attendees or the
organizer to cover additional costs such as A/V and food.

&lt;/p&gt;
&lt;p&gt;All proposals should be submitted online at
  &lt;a href="http://www.OpenSourceGridCluster.org/CFP.html"&gt;http://www.OpenSourceGridCluster.org/CFP.html&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Questions should be sent to program@OpenSourceGridCluster.org&lt;/p&gt;</description>
      <pubDate>Wed, 19 Mar 2008 10:15:28 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:7bcc8492-bfae-4aa9-9983-d628b98cd909</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/19/cfp-open-source-grid-cluster-conference-2008#comments</comments>
      <category>News</category>
      <link>http://gridengine.info/articles/2008/03/19/cfp-open-source-grid-cluster-conference-2008</link>
    </item>
    <item>
      <title>Clever job prioritization tip</title>
      <description>&lt;p&gt;Grid Engine has a built-in priority mechanism that is useful for allowing end users to sort and prioritize their own personal pending tasks -- this gives the users the ability to submit many jobs but still dictate which of those jobs need to be run more urgently than the rest.&lt;/p&gt;
&lt;p&gt;In practice, though, this is actually fairly clunky to implement. By default the following conditions exist:
&lt;ul&gt;
&lt;li&gt;SGE will accept a priority range of &lt;code&gt;-1023 to 1024&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;By default all jobs get assigned a value of &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Only SGE managers can assign priority values higher than &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Normal users can only assign negative priority values&lt;/li&gt;
&lt;/ul&gt;
See where we are going here? By default, a non privileged user can only describe some of her jobs as "&lt;em&gt;less important&lt;/em&gt;" than others. There is no mechanism (besides granting the user SGE manager authority) for her to say "&lt;em&gt;this job of mine is more important than that other pending job of mine...&lt;/em&gt;".
&lt;/p&gt;
&lt;p&gt;This is, ummmm, awkward to say the least and works in a way that is 100% opposite from what a sensible user or SGE Admin would expect. Users can only decrease the relative priority of their job in the default environment.&lt;/p&gt;
&lt;p&gt;A recent mailing list &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=23838"&gt;post&lt;/a&gt; from Jeff highlights a nice little workaround. &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=23838"&gt;Jeff describes&lt;/a&gt; creating an entry in the &lt;code&gt;sge_request&lt;/code&gt; file that automatically assigns a value of &lt;code&gt;-p -100&lt;/code&gt; to all submitted jobs that don't override the default with their own use of the &lt;code&gt;-p&lt;/code&gt; switch.
&lt;/p&gt;
&lt;p&gt;This is a nice approach because by default it harms nobody (as all jobs have &lt;code&gt;-p -100&lt;/code&gt;. Yet it gives headroom for a non privileged user  to use the priority range &lt;code&gt; -99 to 0&lt;/code&gt; to designate some of her jobs as more personally important than others.
&lt;/p&gt;
&lt;p&gt;Background reference: manpage for &lt;a href="http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/htmlman/htmlman5/sge_request.html"&gt;sge_request&lt;/a&gt;.&lt;/p&gt;

 

</description>
      <pubDate>Thu, 13 Mar 2008 13:28:48 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:97c3632f-e1d9-445c-a429-ba5c3901232e</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/13/clever-job-prioritization-tip#comments</comments>
      <category>Resource Allocation</category>
      <category>MailList Bits</category>
      <category>priority</category>
      <link>http://gridengine.info/articles/2008/03/13/clever-job-prioritization-tip</link>
    </item>
    <item>
      <title>Clever sharetree usage for project-based job priority grouping</title>
      <description>&lt;p&gt;Recently on the mailing list there has been a &lt;a href="http://gridengine.sunsource.net/servlets/BrowseList?list=users&amp;by=thread&amp;from=28842"&gt;discussion&lt;/a&gt; centering on how to order jobs within a project. For some reason, Daire's &lt;a href="http://gridengine.sunsource.net/servlets/ReadMsg?list=users&amp;msgNo=23820"&gt;reply&lt;/a&gt; did not make it into the previous thread list. It is an interesting approach. Daire's method allows groups of jobs within a project to have different levels of priority entitlements in a way that does not interfere with other shares or projects. It also allows users or project leaders to more easily reallocate priorities on the fly, simply by changing the project associated with a task.&lt;/p&gt;

&lt;p&gt;Daire writes:
&lt;pre&gt;
&lt;em&gt;...we decided to abstract gridengine's projects into priority groups
so that you can order jobs within a project by changing the job's project
(qalter -P). So for example if you have 2 projects A and B your 
sharetree might look something like:&lt;/em&gt;

ROOT
|-- A (75)
|   |-- A_1 (P) (10)
|   |-- A_2 (P) (1000)
|   |-- A_3 (P) (100000)
|   |-- A_4 (P) (10000000)
|   `-- A_5 (P) (1000000000)
`-- B (25)
   |-- B_1 (P) (10)
   |-- B_2 (P) (1000)
   |-- B_3 (P) (100000)
   |-- B_4 (P) (10000000)
   `-- B_5 (P) (1000000000)

&lt;em&gt;where (P) signifies a gridengine "project" and the numbers in ()'s
are the assigned share values. Now you can move a job within project 
A between the 5 priority levels without effecting project B's share. 
Maybe use the Functional policy to ensure equal shares between users 
in the same priority band? Not sure how much the halflife factor will 
mess with the priority bands but the I'm assuming the large share 
differences between them will override the effect...
&lt;/em&gt;
&lt;/pre&gt;
&lt;/p&gt;

&lt;p&gt;Similar discussions occur on &lt;a href="http://gridengine.sunsource.net/servlets/BrowseList?list=users&amp;by=thread&amp;from=28107"&gt;this mailing list thread&lt;/a&gt;. 
&lt;/p&gt;


</description>
      <pubDate>Wed, 12 Mar 2008 13:23:41 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:a3a65015-96f6-4ca7-8e5d-0fa65cf399df</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/12/clever-sharetree-usage-for-project-based-job-priority-grouping#comments</comments>
      <category>sharetree</category>
      <category>project</category>
      <link>http://gridengine.info/articles/2008/03/12/clever-sharetree-usage-for-project-based-job-priority-grouping</link>
    </item>
    <item>
      <title>Reducing scheduler memory usage with libhoard</title>
      <description>&lt;p&gt;It&amp;#8217;s pretty interesting subscribing to the SGE Issues mailing list. This comment on &lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2464"&gt;Issue 2464&lt;/a&gt; came across the wire today:&lt;/p&gt;
&lt;p style="margin-left: 40px;"&gt;&lt;i&gt;&amp;#8230; I installed libhoard.so (&lt;a href="http://www.hoard.org"&gt;http://www.hoard.org/&lt;/a&gt;) and started sge_schedd with it (changing the sge_schedd starting line in sgemaster to &amp;quot;LD_PRELOAD=/opt/hoard-3.7.1/lib64/libhoard.so sge_schedd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
There seems to be some problems with malloc and threads not freeing memory (or something similar, Andreas could explain this the right way) which could be affecting sge_schedd.&lt;br /&gt;
&lt;br /&gt;
Since restarting sge_schedd using hoard I didn&amp;#8217;t have any memory problems anymore, but this just happened one day ago.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;If anyone else tries this method I&amp;#8217;d appreciate feedback and comments.&lt;/p&gt;

</description>
      <pubDate>Thu, 06 Mar 2008 09:45:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:3750f222-a6d8-41c0-b2d5-a0b544f63898</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/06/reducing-scheduler-memory-usage-with-libhoard#comments</comments>
      <category>MailList Bits</category>
      <category>memory</category>
      <category>memory leak</category>
      <category>libhoard</category>
      <trackback:ping>http://gridengine.info/trackbacks?article_id=reducing-scheduler-memory-usage-with-libhoard&amp;day=06&amp;month=03&amp;year=2008</trackback:ping>
      <link>http://gridengine.info/articles/2008/03/06/reducing-scheduler-memory-usage-with-libhoard</link>
    </item>
    <item>
      <title>Screenshots of enhanced Olesen FLEXlm tools in action</title>
      <description>&lt;p&gt;In a follow-up post to Mark's&lt;a href="http://gridengine.info/articles/2008/03/04/olesen-flexlm-integration-tools-updated"&gt; recent announcement&lt;/a&gt; we've gotten our hands on some screenshots from Mark showing his tools in use. The screenshots show the results of using XSLT transformations to turn Grid Engine XML data into XHTML form suitable for web pages. The benefit includes web-based visibility into current resource (and software license!) usage. This is exactly the approach that I tried out with the &lt;a href="http://xml-qstat.org/"&gt;xml-qstat&lt;/a&gt; project.  Mark is pretty familiar with that effort and will be merging his improvements and enhancements into xml-qstat's SVN repository. Speaking personally as a &amp;quot;scratch an itch&amp;quot; programmer with no real software engineering skill or talent I'm pretty excited to have a real coder take a look at xml-qstat. Related to that I already owe a debt to Petr Jung from Sun who contributed the Java based CommandGenerator code that finally allows xml-qstat to be a 100% Java/Cocoon web application that does not require external perl daemons to cache XML state data.&lt;/p&gt;
&lt;p&gt;Before the screen captures, I'd like to ask a favor of people who read this blog. I filed bug &lt;a href="http://gridengine.sunsource.net/issues/show_bug.cgi?id=2335"&gt;Issue #2335&lt;/a&gt; back in July of 2007 and it has not received much love (or even a targeted milestone date for a fix). The bug is a simple one -- &amp;quot;qstat -f -xml&amp;quot; no longer reports load average data which (a) makes xml-qstat a whole lot less useful and (b) breaks the SGE developer philosophy  of ensuring that command output returns the same information regardless of output format. Until that bug is fixed it does make sense for xml-qstat to have it's long overdue &amp;quot;1.0&amp;quot; release. If you have a user account over on &lt;a href="http://gridengine.sunsource.net"&gt;http://gridengine.sunsource.net&lt;/a&gt; I'd appreciate it if you can cast one of your &amp;quot;votes&amp;quot; for Issue 2335. Thanks!&lt;/p&gt;
&lt;p&gt;And now the screenshots (edited to mask out personal/company information). Click on each image for a larger version.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;qhost overview&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qhost-overview.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qhost-overview_sized.jpg" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Click on through for the rest of the pictures ...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;qstat full view (a)&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qstat-fullview-a.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qstat-fullview-a_sized.jpg" alt="" /&gt;&lt;/a&gt; &lt;br /&gt;
&lt;strong&gt;qstat full view (b)&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qstat-fullview-b.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qstat-fullview-b_sized.jpg" alt="" /&gt;&lt;/a&gt; &lt;br /&gt;
&lt;strong&gt;qstat queue summary&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qstat-queue-summary.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qstat-queue-summary_sized.jpg" alt="" /&gt;&lt;/a&gt; &lt;br /&gt;
&lt;strong&gt;qstat resource summary&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qstat-resource-summary.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qstat-resource-summary_sized.jpg" alt="" /&gt;&lt;/a&gt; &lt;br /&gt;
&lt;strong&gt;qstat view&lt;/strong&gt;&lt;br /&gt;
&lt;a href="http://gridengine.info/misc/olesen-screencaps/qstat-view.jpg"&gt;&lt;img src="http://gridengine.info/misc/olesen-screencaps/qstat-view_sized.jpg" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 06 Mar 2008 09:21:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:1545ac1f-0c30-4e5d-b0b5-84191b710d5e</guid>
      <author>dag@sonsorol.org (chris)</author>
      <comments>http://gridengine.info/articles/2008/03/06/screenshots-of-enhanced-olesen-flexlm-tools-in-action#comments</comments>
      <category>Application Integration</category>
      <category>External Tools &amp; Apps</category>
      <category>olesen</category>
      <category>olsesen flexlm</category>
      <category>xmlqstat</category>
      <category>xml</category>
      <category>qstat</category>
      <link>http://gridengine.info/articles/2008/03/06/screenshots-of-enhanced-olesen-flexlm-tools-in-action</link>
    </item>
  </channel>
</rss>
