choosing: SGE vs LSF vs Torque

Posted by chris Sun, 12 Feb 2006 18:22:56 GMT

In this thread, Mark Olesen provides a bit of detail explaining why his group chose to deploy Grid Engine. His comments about the true depths of what "turnkey" vendors can provide is spot on and should be kept in mind by anyone researching or considering deploying a distributed resource management software layer:

... Unfortunately, nobody could offer us a complete turn-key solution. They could install the system, set up queues in accordance with our specifications, and include any job submissions scripts that we would provide them. We were most certainly left with the impression that we would essentially need to specify how 90% of everything should be implemented, and they would implement it for us.

We thus took exactly the opposite approach and decided to try and learn the remaining 10% ourselves and GridEngine appeared to be the best option. In case it didn't pan out with GridEngine, we figured that we could always invest in a commercial solution or get commercial support from Sun. In either case, we'd have gained a good idea of job submission scripts and how queuing should or should not work.

As you may guess, we haven't found a reason to move away from GridEngine. With the version 6, any doubts that may have remained have been removed.

This is valuable advice, a quick trawl through the SGE users mailing list will show a vast array of different usage, configuration and deployment requirements. Even in my day job, where I've spent a lot of time deploying SGE for use in particular industries I still see SGE used in many different ways.

As a general rule, people looking to get the most out of Grid Engine (or any other similar product) should plan on developing and maintaining at least a small amount of in-house expertise. How else can you ensure that your "turnkey" vendor did a suitable job?

Meanwhile...

Over on the bioclusters mailing list, Bonnie started a similar thread about choosing distributed resource management software. Tim Cutts mentions a post I had made on "SGE and LSF and which is Best" -- the post he referrs to is here:

http://bioinformatics.org/pipermail/bioclusters/2005-August/002671.html

I still think that summary of "SGE vs LSF" is correct. In 2006 everyone has the core functions down now so the main comparative differences have to do with cost, support and the various sets of layered features and add-ons offered. The one addendum I should add is that I think in all of 2005 I never found a need or requirement to swap out SGE on a project in favor of Platform LSF.

This will change in 2006 as I'm working with at least one very large client who will likely be best suited by going with Platform LSF. I'm looking forward to this actually, it will be a nice change and a good way to re-polish my LSF knowledge.

I'm also looking forward to finding the time to re-evaluate PBS Pro, its' been a long time since I've been hands-on with that offering.