Main Page

From GridWiki

Jump to: navigation, search

Contents

About this wiki

The core Grid Engine websites are:

This wiki server (and the related http://gridengine.info blog) has been set up by members of the Grid Engine user community as a way of collecting and distributing usage, configuration, tips and HOWTO information in a fast and democratic way. We don't want to reinvent the wheel or waste time so wherever there are better online resources available, this Wiki will simply link out to them.

The best place to get help with Grid Engine related issues is the sge-users mailing list.

Grid Computing & Grid Engine Overview

This is not the place for an academic dissertation or even a management-friendly summary of "grid" or "cluster" computing. If you are just starting out learning about clusters you may want to check out sites like http://www.clustermonkey.net. Some good intro-level information on Grid Engine and it's capabilities can be found here:

Grid Engine is software that facilitates "distributed resource management" (DRM) - other similar software packages include Portable Batch System, Torque and Platform LSF. Far more than just simple load-balancing tools or batch scheduling mechanisms, DRM software typically provides the following key features across large sets of distributed resources:

  • Policy based allocation of distributed resources (CPU time, software licenses, etc.)
  • Batch queuing & scheduling
  • Support diverse server hardware, OS and architectures
  • Load balancing & remote job execution
  • Detailed job accounting statistics
  • Fine-grained user specifiable resources
  • Suspend/resume/migrate jobs
  • Tools for reporting Job/Host/Cluster status
  • Job Arrays
  • Integration & control of parallel jobs

Is Grid Engine commercial or open source software?

Both. Sun Microsystems sponsors development of the freely available and open source Grid Engine product hosted at http://gridengine.sunsource.net. The software found on the http://gridengine.sunsource.net site is the free "open source" version and "N1 Grid Engine 6 (N1GE)" can be found at http://www.sun.com/gridware/.

Sun takes the open source codebase inhouse and offers it for sale as a formally supported enterprise distributed computing product. There is no functional difference between the base open source and commercial versions. The commercial version differs from the free version in the following key ways:

  • Additional QC/QA testing done internally by Sun
  • Localization to non-English languages
  • Global support & professional services
  • Tight integration with other "N1" management, service and provisioning technologies

Grid Engine 5.x "Standard" vs. "Enterprise" Editions

Prior to the release of Grid Engine 6 in 2004, there were two different "flavors" of Grid Engine offered by Sun Microsystems. The "standard" edition was a Sun software product made available for free to any user or institution. The "Enterprise Edition" was available from Sun at additional cost. The only difference between the two offerings was the number of scheduling policy types supported. The standard edition generally supported only a "first-in, first-out" FIFO type scheduling policy along with a simple user-sort scheduling mechanism. The Enterprise Edition supported FIFO scheduling in addition to several other more flexible policy based scheduling mechanisms. Interestingly enough, the codebase and binaries for both flavors are 100% the same. The difference between "standard" and "enterprise edition" is triggered by a simple flag passed via an installation script. During the time that Sun Microsystems was offering two different types of "Sun Grid Engine", the open source site was offering both flavors of "Grid Engine" for free.

"Sun N1 Grid Engine 6" vs. "Grid Engine 6"

When Grid Engine 6 was first released, Chris Dagdigian wrote a simple whitepaper entitled Understanding the differences between Grid Engine 5.3, 6.0 and Sun N1 Grid Engine 6 (N1GE) - while semi dated it provides a good overview of the differences.

Major project news and milestones

  • 2005
    • Further complicating the interesting relationship between the Sun branded and open source versions of Grid Engine, Sun made a surprise announcement in December 2005 where the company announced that (among many other software products), the full Sun N1 software stack including N1 Grid Engine would now be available for "free". More information on this announcement can be found online: http://gridengine.info/articles/2005/12/01/sun-n1-grid-engine-is-now-free
  • 2006
    • Another major change for Grid Engine occurred when Sun announced at the Supercomputing 2006 conference that all of their commercial product add-ons for Grid Engine (ARCo and Windows client support) would be integrated with the open source Grid Engine codebase. The full announcement can be read here: http://gridengine.sunsource.net/news/SuperComputing2006.html
  • 2007
    • Grid Engine 6.1 was released, the first major revision release since Grid Engine 6.0 was announced in 2004. Included among numerous improvements and enhancements is the new highly-capable Resource Quota subsystem.
  • 2008
    • Grid Engine 6.2 will be released in Q2 2008. SGE 6.2 will be running on the TACC Ranger supercomputer, which will have 62,976 processor cores in 3,936 execution hosts. Features of SGE 6.2 include Advance Reservation and Array Job Task Interdependencies.

The Video

If you'd rather have the Grid Engine product team tell you about Grid Engine, there's a video on YouTube from the Grid Engine team that introduces grid computing and talks about what Grid Engine is and how it's commonly used.

Documentation

See http://gridengine.sunsource.net/documentation.html

HOWTOs

See this link: http://gridengine.sunsource.net/project/gridengine/howto/howto.html

Sun BluePrints

Sun maintains an interesting technical library of "BluePrint Documents". Interesting publications include:

Application Integration

LAM-MPITight integration of Grid Engine and LAM-MPI
FLEXlm License ManagerNew (Olesen) method with some configuration notes
FLEXlm License ManagerOld "load sensor" method
FLEXlm License ManagerFLEXlm license load sensor written in Python
LicenseJugglerSharing software licenses across multi Grid Engine sites
MatlabThe Grid Engine community is looking for Matlab integration methods and tips.
AnsysRunning Ansys applications as Grid Engine jobs.
ClearcaseIf you are looking to use SGE to improve the clearcase build time the SGE is not your solution yet. Commercial solutions exists like Electric-Cloud and IBM's buildforge.
PESTPEST is a general purpose, parameter estimation and optimization program that can be used with any simulation code. Notes and a skeleton script to integrate parallel PEST and Grid Engine can be found here.
Distributed-CompilationGrid Engine has a tool called "qmake" which can help distribute large source code building tasks across a cluster of machines. Information on distributed builds has been moved to the Distributed-Compilation wiki page.
Job Arrays
DytranRunning Dytran applications as Grid Engine jobs.

Platform Integration

Macintosh OS XGridEngine_launchd for notes on getting Grid Engine to function under the new launchd framework for system services in OS X 10.5 (Leopard)
WindowsYou can start jobs on a UNIX/Linux machine from Windows using SSH and SAMBA.
Linux&Windows Install and configure Grid Engine in heterogenous environment on Linux and Windows with MPICH2

Grid Integration

Datacenter Related Topics

Utilities

Various short utilities for doing stuff with Grid Engine can be found on the Utilities page.

ARCo Queries

On the ARCo Queries, ARCo users can contribute and share their custom ARCo queries.

Grid Engine wiki and related projects

Documenting Grid Engine XML output

Chris Dag has started a page GridEngine_XML to document the internals of Grid Engine XML output status data.

Documenting and understanding Grid Engine 'qping' output

Chris Dag has created a placeholder page GridEngine_qping to document troubleshooting methods involving the 'qping' utility.

Grid Engine XML parser project

  • Dan has started Grid Engine XML parser project under Passau

Development Specifications

Development specifications for 6.2 are kept on GridWiki


Development specifications for 6.1 are kept on GridWiki

but there are also others which are not yet implemented

Grid Engine Packaging Efforts

Packaging documentation and disussion has been moved to its own GE-Packaging wiki page. This is of interest to people working on alternative binary or source installation/distribution methods for Grid Engine.

Success Story

There is a list of DRMAA success stories kept under http://www.drmaa.org/wiki/index.php/DrmaaUsers

Frequently Encountered Problems

Stephan's Blog Posts

This is an archive of Stephan Grell's blog posts.

Google Summer of Code 2007 Ideas Page

You can find it here.

References


Please see documentation on customizing the interface and the User's Guide for usage and configuration help.

Personal tools