Listing idle execution hosts
In response to a query on the SGE users mailing list, Dave Love posted a short shell script that parses the output of "qhost -j" in order to list out hosts that are active in Grid Engine yet not running any jobs.
The post (with script added as an attachment) can be found here:
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=94053
Screencast showing 6.1 to 6.2 inplace upgrade
Lubomir Petrik published the above screencast, check it out.
Screenshots from the new (beta) install GUI
18 screenshots from the new GUI based installer that is present in the 6.2 beta release.
The Flickr photo set is here:
http://flickr.com/photos/chrisdag/sets/72157611344682697/
SGE 6.2 beta candidate also out
Wow, check out the new feature list in the just released SGE 6.2 beta announcement:
- GUI based installer helping new users to more easily install the software. It complements the existing CLI based installation routine
- New support for 32-bit and 64-bit editions of Microsoft Windows Vista (Enterprise and Ultimate Edition), Windows Server 2003R2 and Windows Server 2008.
- Client and server side Job Submission Verifier (JSV) allows an administrator to control, enforce and adjust jobs requests, including job rejection. JSV scripts can be written in any scripting language, e.g. Unix shells, Perl or TCL.
- Consumable resource attributes can now be requested per job. This makes resource requests for parallel jobs much easier to define, especially when using slot ranges.
- On Linux, the use of the 'jemalloc' malloc library improves performance and reduces memory requirements
- The use of the poll(2) system call instead of select(2) on Linux systems improves scalability of qmaster in extremely huge clusters
Note that this beta release is not for production use and is aimed at an experienced SGE audience. Please test it out and give the developers your feedback!
The announcement has all the details...
SGE 6.2u1 released today
Grid Engine 6.2 update 1 has been released, the official announcement page does not seem to be up yet but you can find the older "available" notice here.
The list of fixed issues is significant and can be viewed here:
http://gridengine.sunsource.net/project/gridengine/62patches.txt
Updated MPICH2 Integration HowTo
Thanks to Reuti for taking the time to update the MPICH2 integration HOWTO on the gridengine.sunsource.net site. Send feedback and comments to the SGE users mailing list.
Link:
http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html
Platform Gets Cheeky
Platform Computing is handing out these T-shirts at the Supercomputing 2008 conference in Austin, TX this week.
Here is a summary of reaction from various people I've talked to about it...
- Platform Computing staff love the shirts and are happy that more aggressive messaging is being used. The message apparently originated within the marketing group at the company.
- Marketing types are split -- some think the slogan crosses a line while others think it is brilliant and on-target
- Not a single soul I've spoken to believes that the 'qstat' reference relates to PBS and the various PBS software variants. Opinion seems universal that this is either a dig at Grid Engine or possibly a pointed reaction to Univa UD and some of the aggressive things that Univa has been doing lately.
- Univa UD staff believe that this message is aimed at them.
- Sun Microsystems people see this as a Grid Engine dig and a sign that the landscape has become more competitive.
Fedora 10 will ship with SGE 6.2
I'm late in catching up with Grid Engine mailing list traffic but this one from Orion Poplawski caught my eye:
F-10 will ship with 6.2-3. I'll be pushing a 6.3-4 (or later) 0-day update as well:
* Tue Nov 11 2008 - Orion Poplawski
- 6.2-4
- Add note to README about localhost line in /etc/hosts
- Cleanup setting.sh some, no more MAN stuff
- Add conditional build support for EL
- Use system db_* utils in bdb_checkpoint script
I've got the src.rpm here:
http://www.cora.nwra.com/~orion/fedora/gridengine-6.2-4.fc11.src.rpm
This should build on EL-5, F-9, and F-8 with sun java 1.6.0 installed.
These rpms are geared for minimal NFS type installs. install_* scripts
should work, though install_execd should not be needed for standard
"default" installs. Bugs to https://bugzilla.redhat.com/.
Reuti: Tight integration with Intel MPI 3.1 or MPICH2
Via this thread ...
Reuti has updated his methods and information for achieving tight integration in MPICH2 environments. An updated set of files for mpd integration for MPICH(2) is now at http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-60.tgz
The thread discusses Intel MPI 3.1 with the suggestion that the above methods for MPICH2 may work with the Intel product. The basic issue is that the standard "mpdboot" method has always been difficult to achieve tight integration with Grid Engine environments.
LSF to SGE Migration Workshop at SC08
For people who will be attending the SuperComputing 2008 conference next week in Austin, TX there will be an interesting full-day workshop on Monday, November 17th entitled "How to migrate from LSF to Unicluster with SGE".
Sure this workshop talks about UniCluster but the foundation of that product is Sun Grid Engine. Much of what will be discussed here will be applicable to both Univa UD customers and the community at large.
Some of the technical information including an LSF to SGE quick reference guide is coming via the Open HPC Management Interoperability (OHMI) project.
Click below to download the invitation:
LSF-SGE-Migration-Invite.pdf
My flight lands in Austin at noon on the 18th so I'll be present for the 2nd half of the workshop.
Fixing a berkeley db spool database
Per this thread on the users list, a recepie for rebuilding and re-verifying a Berkeley based binary SGE spool:
service sgemaster stop # on failover server service sgemaster stop # on master server cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb $SGE_ROOT/utilbin/lx24-amd64/db_verify sge $SGE_ROOT/utilbin/lx24-amd64/db_recover $SGE_ROOT/utilbin/lx24-amd64/db_dump -f sge.out sge mv sge sge.old $SGE_ROOT/utilbin/lx24-amd64/db_load -f sge.out sge $SGE_ROOT/utilbin/lx24-amd64/db_verify sge service sgemaster start # on master server service sgemaster start # on failover server
Open-architectures for Reading, Writing & Computing with Genomes
George Church, a giant in my field is giving a talk in the Boston area on Thursday, November 19th entitled "Open-architectures for Reading, Writing & Computing with Genomes".
I won't be able to make it due to the SuperComputing 2008 conference in Austin happening that week. If you are at all interested in the topic of computers and biology this should be an excellent talk.
I'm justifying this post here because of the fact that many of the compute farms that process genomic data and an ever-growing number of the actual lab instruments doing "next generation DNA sequencing" are actually using Grid Engine under the hood. The well known example here are the Solexa/Illumina instruments. More are coming that I can't speak about yet. Go Grid Engine!
Talk abstract:
Relative to a reference human genome, your personal genome has about 10,000 DNA variations which affect final protein function and 3 million which do not. While “association studies” of common DNA variations with diseases yield, so far, weak predictive power and few causative mutations, researchers expect that this will be soon remedied by genome-wide sequencing. Second-generation sequencing (multiplex cycles of fluidics and imaging) has brought costs down since 2004 by 10,000-fold $300M to $30K -- and less than $1000 via targeted sequencing including coding variants (~1% of the genome), regulation, microbes and immune response (quantitated by sequencing). Polonator.org is the only of the second-generation that has open architecture for hardware, software, and wetware – and 4-fold less expensive. Similarly PersonalGenomes.org is a uniquely open effort to integrate the above genomic data with comprehensive sets of medical and non-medical traits. We are collecting over 20 terabytes of raw genomic data for each of 100,000 research subject volunteers -- which boils down to less than a gigabyte each of differences from the reference genomes and quantitative genomic and trait data. Like DNA sequencing, raw DNA synthesis has come down in cost by 7-logs since 1980, but the next challenge currently being met is applying this to “programming” organism-level functions – by developing computer-aided design, new homologous recombination instruments, lab-scale accelerated evolution and personalized stem cells.
Grid Engine, workflows & virtualization
Another discussion happening recently on the SGE user list concerns how best to handle virtualization. That thread can be browsed here.
In a followup, Andreas is soliciting feedback from the wider community on how you want to see this area handled in future revisions of Grid Engine. Time to speak up if you have an opinion!
Read Andreas's request for feedback here.
Grid Engine & power saving
I'd guess that most people don't follow the SGE developer list all that closely. Sometimes the developer discussions cross over into areas that all users may be interested in.
There has been an interesting discussion on various ways to give SGE the ability to either directly trigger or otherwise interact with various systems that either switch nodes down into lower power states or even completely power them down/up as needed (Project Hedeby / SDM, etc.)
Automatic methods for powering up and down portions of clusters based on workload have been used for years now but the topic seems to be getting more interest and more backing. A few years ago I saw a neat solution that some people at Cornell Medical College had done -- they used PBS/Torque and had various IPMI scripts that powered nodes on or off depending on the size of the pending job list.
The developer thread (via MarkMail) is here. The CollabNet "Forum View" is here.
Collabnet update issues

A recent upgrade to the infrastrastructure that back-ends the http://gridengine.sunsource.net site has been causing some complaints within the community.
Some of the complaints include:
- Emails sent to the various SGE lists have been stripped of common email headers that some people had used to sort and index messages into various folders
- Anyone registered with a collabnet account now had their username substituted into the FROM field on emails. This means that I became "craffi" instead of "Chris Dagdgian" and frequent poster Ron Chen started showing up up as "ron". Annoying.
- The mailing list archives now appears to be a web Forum. Searching the list appears harder and many valid emails are "hidden" because the thread view obfuscates the fact that many SGE users incorrectly reply to any old SGE message to ask a new query. This causes a new thread to not be created and instead hides entire conversations under misleading subject lines.
Many of the complaints are being addressed so speak up if you see something. I think the SGE FROM: issue has already been fixed.
This may also be a good time to re-mention the excellent MarkMail interface to the SGE lists: http://gridengine.markmail.org/ -- this may be the best all around method for searching SGE email archives for solutions.



XML Feeds