Creating Hadoop PE under Grid Engine
Dan has found a great Sun blog article by Ravi Chandra Nallan post on integrating Hadoop into SGE via the use of a parallel environment.

Image source: http://hadoop.apache.org/core/
Links:
June 2008 SGE Workshops
Consider this post a plug for the upcoming June 2008 SGE User and SGE Admin workshops that are being held in the Boston, MA USA area.
More details here:
http://blog.bioteam.net/2008/03/22/sge-training/
OSGC Presenter Slides
The full list of talks along with links to PDF, PPT and video files can be found here:
http://opensourcegridcluster.org/programming.html
At the time I'm writing this, my PDF slides have not made it up on the official site yet. For anyone interested in those, here are some temporary download links courtesy of the Bioteam Blog. For people who saw me at the German SGE Workshop, the only new topic is "SGE & Amazon EC2".
Here is a short culled list of talk names and slides that may be of interest to this community:
- Sun/Community State of the Union - Speaker: Fritz Ferstl
- New features in Sun Grid Engine 6.2 - Speaker: Lubomír Petrík
- Using OGF Standards for Grid and HPC - Speaker: Chris Smith
- Grid Engine at the Texas Advance Computing Center - Speaker: Roland Dittel
- Grid Heating: Dynamic Thermal Allocation via Grid Engine Tools (ppt) - Speaker: Paul Brenner
- Service Domain Manager – Basics and Concepts - Speakers: Richard Hierlmeier & Ryszard Macidlowski
- Accounting and Reporting Console Multi-Cluster Support - Speaker: Jana Olivova
- Making Grid Engine Highly Available with Open High Availability Cluster and OpenSolaris - Speakers: Ashutosh Tripathi
- HPC Visualization on the Grid - Speaker: Linda Fellingham & Dean Stanton
- PluS: An Advance Reservation plug in for Sun Grid Engine - Speaker: Hidemoto Nakada
- Berkeley Laboratory Checkpoint Restart - Speaker: Eric Roman
Screencast: live install of SGE6.2 beta
Truthfully speaking, after taking an overnight flight back to Boston from California I really was in poor shape to actually get any real work done today.
I've recorded my experience installing the fresh release of SGE 6.2beta on my laptop. The video screencast itself is hosted over at a BioTeam site -- it's only fair because BioTeam is paying for the hosting costs as well as the screencast recording software!
The video screencast is linked off of this blog post:
http://blog.bioteam.net/2008/05/16/sge-62beta-unboxing-screencast/
I am still on the fence as to if this screencast stuff is actually useful. Maybe it's all just web-2.0 style style-over-substance crap. Comments appreciated and will help me figure out how much effort to put into video content vs. straight up blog or technical writing.
Open Source Grid & Cluster Conference Photostream
I'll post links to talk slides shortly, meanwhile a photo stream from the event can be found here:
http://flickr.com/groups/opensourcegridcluster/
SGE 6.2 beta binaries are available for testing
I'm not going to waste time copying the release announcement into a blog post. The full announcement can be read here:
http://gridengine.sunsource.net/servlets/ReadMsg?list=announce&msgNo=94
Lots of significant changes in the product itself. I also love the migration of manuals and docs to the new http://wikis.sun.com/display/GridEngine site.
Please remember that the reason for this beta release is to allow you to test 6.2 before it officially goes out the door in final form. The more people we have working on and stress-testing 6.2 the less chance there will be an inconvenient or unexpected upgrade issue, bug or glitch. The developers have good testbed environments and testsuites but they can't simulate all the different ways and methods that we use (and abuse!) SGE to get work done. Help make the 6.2 release a big success by testing now and providing feedback.
Testing flickr screencast hosting
I'm doing a short "Intro to SGE" tutorial today as part of the Univa ClusterExpress tutorial that is happening this week at the http://www.opensourcegridcluster.org/ conference.
As part of general paranoia I recorded some screencasts of trivial SGE command line usage to play at my talk if my demo system goes unavailable. Just for the heck of it I uploaded some of these screencasts to my Flickr photostream and then added them to the conference group photo pool .
Please let me know what you think by leaving a comment here or dropping me an email. I don't think the quality is all that great as it is hard to see the text all the time. I may go back to producing screencasts with Camtasia Studio and hosting them over at http://www.screencast.com (if you see the videos in the lower sidebar, those are done in camtasia and hosted at screencast.com).
SGE XML output getting some needed attention
For people like myself who are interested (or say, dependent) on the XML output features of Grid Engine it's been a lonely time. This area of Grid Engine was not really getting much love, attention or bug fixes until recently.
Happy to report that this seems to have changed. If you are at all interested in using SGE data in XML form then you may want to:
- Pay attention to this mailing list thread
- Watch this SGE Wiki page
Kudos to Michael Pospisil from the Sun Microsystems SGE developer team in Prague for soliciting and listening to community input -- looks like the change may be bigger than simple bug fixes and output normalization. There is some talk about making XML output more usable to the end-users instead of the current design where XML output is largely a straight representation of internal SGE Cull lists and data structures.
Roland: things that affect job deletion time
In this interesting users-list thread, Roland provides some nice comments on the various things that can affect the time it takes to delete a Grid Engine job.
Specifically mentioned is a new hash implementation slated for the upcoming 6.2 release that dramatically improves things.
From Roland's post:
...for GE 6.2 I've analyzed the hotspots deleting jobs and what I've found is:
1) the time deleting a job increases with the amount of pending jobs in the cluster and the amount of queue instances. The reason for this is the messages list for schedd_job_info. Every message in the qstat -j output is one list element and below this element are the job id references stored inheriting this message. At job deletion time qmaster has to loop over the whole list of messages and loop over all references to removes right one. As a matter of fact this does not scale, and for 6.2 I've added a hash access to the reference id that decreased the job deletion time in large clusters heavily. Sadly I don't remember the exact numbers.
To verify this you can disable schedd_job_info in the scheduler config and then delete your jobs.
2) The job script and the job itself needs to be removed from the database. This time depends if you use berkeleydb or classic spooling and if you spool on local storage or on a NFS share. As faster your access to the storage is as faster you can delete the jobs.
If disabling schedd_job_info doesn't help in your case you might be hit by this point.
3) With 6.1u3 we've introduced the parameters gdi_timeout and gdi_retries to tune this behaviour. But that's anyway more a workaround than a real solution.
Keeping single slot jobs off of certain nodes
In this thread, Paul asks:
"I'm looking at finding a way to either limit single-slot jobs, or requiring all jobs in a given queue to be running in a pe. Specifically, I have some SMP nodes, that I'd rather not waste on single thread, and also keep the single thread jobs off of the infiniband connected nodes. I have gigE small cpu count nodes for this task."
Dan replied with another example of clever use of the new SGE Resource Quota syntax within SGE 6.1 and later:
You can use resource quota sets to restrict non-PE jobs to certain queues hosts.limit pes !* hosts @smp to slots=0
Slick!
Think I'm going to like the new Sun wiki
One of the more interesting things (to me at least!) in the recent news about the SGE 6.2 beta was the word that all documentation and manuals would be moving to a new home at http://wikis.sun.com.
I registered a user account a few days ago and hit the site today to see if any SGE stuff had made it over. The screenshot below is what I found. It's nice to see smart tech people have a sense of humor. The downtime must have been short as when I I refreshed the browser the site was back to normal.
SGE 6.2 goes beta next week (your help needed)
SGE 6.2 is being released in Beta form next week and the developers are asking for people to make some time if possible to fully test out the beta snapshot of the latest major SGE point release.
Andy's full note can be found here (well worth reading in full ...):
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=24426
In my mind, I'm most excited about the following:
- Advance Reservations & array job inter-dependencies
- The scheduler is now a thread within the qmaster!
- The JVM running within the qmaster
- SGE moving all docs into wiki form!
SGE testbeds: Simulate mass numbers of exec hosts
Interesting message on the developers list recently as a comment attached to Issue 2364. Within, Andreas explains the use of SIMULATE_EXECDS=true parameter that allows unrestricted execution host creation (via suppressing unknown host errors).
I can see this as being very useful for testing SGE scheduler and policy configuration settings before implementing them on production systems.
From the comment:
This is a short HOWTO for the use of the cluster simulator: (1) Start with installing a new SGE cluster as used, but install not more than the qmaster itself (2) After successful installation use qconf -mconf to set SIMULATE_EXECDS=true in qmaster_params section of sge_conf(5). This causes the suppression of the 'unknown' queue states. (3) Make sure the "all.q" and any other queue that you configure does not use any 'load_threasholds'. Cluster simulator has no means to anyhow emulate load values. As a result there will be no load values. For that reason load_threasholds may not be used as it would cause load alarm queue states that prevent scheduler from dispatching jobs into your queues. (4) Use qconf -ae|-Ae to create arbitrary number of simulated execution hosts. The hosts needs not exist as qmaster anyways won't try to send anything to it, but the hostname must be resolvable. Optionally: (5) If you care for scheduler runtimes set PROFILE=true in the params section of sched_conf(5) using qconf -msconf. Now your simulated cluster is ready. You can send in arbitrary numbers of jobs. Due to (2) and (3) scheduler will dispatch them and send corresponding orders to qmaster. Qmaster will behave as if it would start the jobs, but it raise timers to ensure job state transitions are passed as used. What won't work is interactive jobs (i.e. qrsh, qsh etc.) and parallel jobs with control_slaves set to true in sge_pe(5). Jobs' runtime can be controled via the first job argument. That means when # qsub -b y /bin/sleep 5 is submitted, the job will finish after five seconds.
RHEL5.2/Centos5 kernel update may cause problems
This is a heads up for RedHat Enterprise Linux (RHEL) users as well as for users (like myself) of the various Centos variants.
There is a recent patch for RHEL that changes the inode data structure exposed to NFS clients from 32 bits to 64 bits in size. The basic summary of this issue is that many applications may not handle this change gracefully (such as one report with the SGE linux binaries.)
RHEL and modern Centos users should probably pay attention to (by subscribing as CC: contacts) to this issue:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2543
A RedHat bug report discussing the issue in more detail is here:
"Large inode number patch breaks applications"
https://bugzilla.redhat.com/show_bug.cgi?id=241348
mpiblast, SGE and MPICH2 integration
Matthias Neder has posted a quick summary of a tightly integrated MPICH2 integration that can successfully handle his mpiblast application integration.
The summarized solution can be found here:
http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=24204




XML Feeds