Clever job prioritization tip

Posted by chris Thu, 13 Mar 2008 17:28:48 GMT

Grid Engine has a built-in priority mechanism that is useful for allowing end users to sort and prioritize their own personal pending tasks -- this gives the users the ability to submit many jobs but still dictate which of those jobs need to be run more urgently than the rest.

In practice, though, this is actually fairly clunky to implement. By default the following conditions exist:

  • SGE will accept a priority range of -1023 to 1024
  • By default all jobs get assigned a value of 0
  • Only SGE managers can assign priority values higher than 0
  • Normal users can only assign negative priority values
See where we are going here? By default, a non privileged user can only describe some of her jobs as "less important" than others. There is no mechanism (besides granting the user SGE manager authority) for her to say "this job of mine is more important than that other pending job of mine...".

This is, ummmm, awkward to say the least and works in a way that is 100% opposite from what a sensible user or SGE Admin would expect. Users can only decrease the relative priority of their job in the default environment.

A recent mailing list post from Jeff highlights a nice little workaround. Jeff describes creating an entry in the sge_request file that automatically assigns a value of -p -100 to all submitted jobs that don't override the default with their own use of the -p switch.

This is a nice approach because by default it harms nobody (as all jobs have -p -100. Yet it gives headroom for a non privileged user to use the priority range -99 to 0 to designate some of her jobs as more personally important than others.

Background reference: manpage for sge_request.

Open Grid Forum (OGF22) Meeting Discount

Posted by chris Thu, 14 Feb 2008 23:27:14 GMT

The 22nd Open Grid Forum -- OGF22
Hyatt Regency Cambridge
Cambridge, MA USA
February 25-28, 2008
Website: http://www.ogf.org/OGF22/

Coworker Chris Dwan and I will be attending this event and one of us will likely end up speaking. Both of us are known to be somewhat cynical of the "big G" Grid Computing world so we'll be bringing our industry-centric views and bias towards practical solutions into the forum.

(I swear, when you speak to some of these "big G grid" supercomputing or academic folks you get the sense that they think that everyone has 100 million in government funding and a petabyte-scale single namespace storage solution to apply to the problems at hand....)

Among the other attendees that I know about, Chris Smith from Platform Computing will also be there -- he handles Platform's involvement with standards bodies and is another person on my "smart people that I learn a lot from" list. Should be an interesting event.

And finally, some discount registration offers for readers of this blog:

  • "Buy one 1 pass and get a 2nd for free"
  • "$150 discount off the purchase off the full day pass"
Use the code: pharma when registering to get the special prices.

Clever urgency policy usage

Posted by chris Thu, 14 Feb 2008 22:49:34 GMT

It's mailing list posts like this that generate "aha!" moments for me where I realize that I've learned how to tweak SGE behavior in a new way.

Mark answered the original poster with a good suggestion for solving the particular issue at hand -- using qalter to change priority values so that a pending parallel job can rise to the top of the waitlist.

Then Mark offhandedly dropped this little comment:

... If you always want parallel jobs to go first, you can try increasing the urgency of the 'slots' complex.

I'm familiar with the Urgency Policy mechanism in Grid Engine. I've used it many times to address specific problems from a resource allocation perspective. Typically this involves something like using the urgency policy to prioritize the dispatch of pending jobs that consume expensive flexlm software license entitlements. I'm also aware from creating and modifying requestable and/or consumable resources that all of the resource attributes listed in the SGE complex have an urgency parameter associated with them that defaults to 0.

I just hadn't really put it all together until Mark's offhand aside. It's not complicated at all, just ... elegant. Associating urgency entitlements with the "slot" complex means that jobs that need more "slots" will gain additional entitlements and thus rise up through the pending list. Since parallel jobs naturally consume more slots than serial tasks, the end results is that parallel jobs become "more important" in the scheduler mechanism than non-parallel jobs.

I'm guessing not many people have a global "parallel jobs are always more important than serial jobs" use case requirement but for those that do this could be a neat trick.

Clever urgency policy usage

Posted by chris Thu, 14 Feb 2008 22:49:34 GMT

It's mailing list posts like this that generate "aha!" moments for me where I realize that I've learned how to tweak SGE behavior in a new way.

Mark answered the original poster with a good suggestion for solving the particular issue at hand -- using qalter to change priority values so that a pending parallel job can rise to the top of the waitlist.

Then Mark offhandedly dropped this little comment:

... If you always want parallel jobs to go first, you can try increasing the urgency of the 'slots' complex.

I'm familiar with the Urgency Policy mechanism in Grid Engine. I've used it many times to address specific problems from a resource allocation perspective. Typically this involves something like using the urgency policy to prioritize the dispatch of pending jobs that consume expensive flexlm software license entitlements. I'm also aware from creating and modifying requestable and/or consumable resources that all of the resource attributes listed in the SGE complex have an urgency parameter associated with them that defaults to 0.

I just hadn't really put it all together until Mark's offhand aside. It's not complicated at all, just ... elegant. Associating urgency entitlements with the "slot" complex means that jobs that need more "slots" will gain additional entitlements and thus rise up through the pending list. Since parallel jobs naturally consume more slots than serial tasks, the end results is that parallel jobs become "more important" in the scheduler mechanism than non-parallel jobs.

I'm guessing not many people have a global "parallel jobs are always more important than serial jobs" use case requirement but for those that do this could be a neat trick.

Extending job dependency scheduling to array job sub-tasks

Posted by chris Wed, 01 Aug 2007 12:52:14 GMT

More Rising Sun news ...

Rising Sun Pictures, an Australian visual effects house (previous mention) has released a specification document entitled "Grid Engine Array Task Dependency Specification"

The spec is well written and backwards compatibility is assured. The use cases come from digital film and frame rendering. The main goal is to extend the ability of the SGE scheduler to handle array job tasks that themselves may be dependent on the successful completion of other array jobs or even sub-tasks of other jobs.

The full specification is here and well worth a read:
http://open.rsp.com.au/?page_id=11

Project Hedeby documentation draft now available

Posted by chris Sat, 14 Apr 2007 15:23:40 GMT


How Hedeby is being introduced:

In large enterprises, hosts are often divided among different services (e.g. N1GE), and the services themselves are seen as assigned pools of resources (e.g. hosts). When a service is overwhelmed with work one solution may be to remove resources from a service which is not overburdened or less important and assign those resources to the overloaded service. The Hedeby project was established to provide this functionality automatically... (http://hedeby.sunsource.net/)

As reported in this mailing list thread, a first draft version of a Hedeby documentation book has been committed to the project's CVS repository. The book has been transformed and made available as a PDF by an interested member of the SGE community.

Fred Youhanaie found the book and was able to successfully transform the Docbook XML into PDF form. The transformed PDF is available at http://www.anydata.co.uk/gridengine/HedebyBook.pdf

The Hedeby developers may not be incredibly pleased to see a first-draft, first-commit documentation effort grabbed from CVS and instantly made available as PDF so some some standard warnings and caveats should apply. The only people who should check this PDF out are people interested in what Hedeby is, how it is being architected and what some of the first initial use cases are envisioned to be. All other non or semi-interested parties should just relax, sit back and let Hedeby development continue until something is actually officially released.

Project Hedeby documentation draft now available

Posted by chris Sat, 14 Apr 2007 15:23:40 GMT


How Hedeby is being introduced:

In large enterprises, hosts are often divided among different services (e.g. N1GE), and the services themselves are seen as assigned pools of resources (e.g. hosts). When a service is overwhelmed with work one solution may be to remove resources from a service which is not overburdened or less important and assign those resources to the overloaded service. The Hedeby project was established to provide this functionality automatically... (http://hedeby.sunsource.net/)

As reported in this mailing list thread, a first draft version of a Hedeby documentation book has been committed to the project's CVS repository. The book has been transformed and made available as a PDF by an interested member of the SGE community.

Fred Youhanaie found the book and was able to successfully transform the Docbook XML into PDF form. The transformed PDF is available at http://www.anydata.co.uk/gridengine/HedebyBook.pdf

The Hedeby developers may not be incredibly pleased to see a first-draft, first-commit documentation effort grabbed from CVS and instantly made available as PDF so some some standard warnings and caveats should apply. The only people who should check this PDF out are people interested in what Hedeby is, how it is being architected and what some of the first initial use cases are envisioned to be. All other non or semi-interested parties should just relax, sit back and let Hedeby development continue until something is actually officially released.

Dan's video intro to Grid Engine Service Domain Management

Posted by chris Tue, 27 Feb 2007 14:25:51 GMT

Rayson pointed out the following Blog post this morning:
http://blogs.sun.com/HPC/entry/video_sun_grid_engine_demo

Which contains the following great YouTube video of DanT:

If the embedded link does not work, try this:
http://www.youtube.com/watch?v=8QB96lALa5I

Detailed docs on Service Domains and Grid Engine are hard to find. The topic is mentioned a bit in this prior blog post: http://gridengine.info/articles/2006/12/13/its-official-project-hedeby-and-arco-join-the-sge-codebase

Parallel Environment Queue Sort API

Posted by chris Tue, 20 Feb 2007 19:55:19 GMT

Is anyone using this?

While trying to prune down an overflowing email inbox, I stumbled upon a mailing list post from back in May 2006 that I had tagged as something to follow up upon. The post to the developers mailing list asked about a scheduling API for Grid Engine. One of the replies mentioned that the "Parallel Environment Queue Sort (PQS) API" had been checked into the CVS maintrunk but was not on by default.

This API exists and is apparently only documented in the following SGE source file:

source/libs/sched/sge_pqs_api.h

The API seems to provide the hooks necessary for someone to compile his or her own loadable module that can be installed in the $SGE_ROOT/lib/<arch>/ directory. One loaded, the custom code can make the final decision (based on a list of supplied candidates) as to the hosts and queue instances used for a particular parallel job.

People interested in this should read the sge_pqs_abi.h file carefully as there are many caveats and warnings. I'd be interested in hearing from anyone using this API as well.

Help shape Advanced Reservation functionality for SGE-6.2

Posted by chris Fri, 12 Jan 2007 22:18:57 GMT

If you are at all interested in the topic of Advanced Reservation scheduling within Grid Engine, then please take the time to look at (and comment upon) the following draft functional specification document:

Functional Specification Document for 6.2 Advance Reservation

Comments and feedback should be sent to the Developer mailing list. A thread has already been started.

Two new qmon enhancements coming in 6.0u10

Posted by chris Sat, 23 Dec 2006 19:23:05 GMT

With patches supplied by Hin-Tak Leung (more on Hin-Tak in a later article), the following useful enhancements to the X11 'qmon' binary have been added to the CVS repository for inclusion in the next 6.0u10 release:

Custom Widths

Default: Customized:

The first screenshot shows the default layout for the Qmon Job Control pane. The 2nd screenshot shows the new column sizing and layout customized by altering options listed in a personal ~/Qmon preference file. In the cusomized layout, the job name field has been greatly expanded and the Job ID column width has been slightly decreased. A new sliding bar allows access to the columns that can not be displayed with the pane.

Adding Host details to the Cluster Queue pane

Default: Enhanced:

The first screenshot shows the default layout for the Qmon Cluster Queue pane, note that there are only two tabs available within the pane: "Cluster Queues" and "Queue Instances". The second screenshot shows the activation of a third tab named "Hosts"



Read the full article for details on how to activate these changes which are disabled by default ...

Details: How to enable these changes

Both enhancements are controlled by per-user ~/Qmon preference files. To customize the column widths in the Job Control pane, use and adjust the following settings:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! job configuration settings
!! Qmon*job_form*columnWidths: nr of characters per column for
!!                                the first 6 cols
!! Qmon*job_form*visibleColumns: nr of visible columns (without scrollbar)
!!                               if the column sizes shall be bigger this can
!!                               be lowered to show only the first n cols and
!!                               the rest can be reached with the horizontal
!!                               scrollbar
!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Qmon*job_form*columnWidths:      12,8,10,10,7,16
Qmon*job_form*visibleColumns:    6


To enable the additional Host tab within the Cluster Queue pane, add the following details to your ~/Qmon preference file, changing the values from FALSE to TRUE:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! show the Host tab in Queue Configuration
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Qmon*showHostTab:  FALSE
Qmon*automaticUpdateHostTab: FALSE

Enhanced dynamic limits in the new resource quota system

Posted by chris Sat, 23 Dec 2006 18:31:23 GMT

A bit of interesting news via the GE issues mailing list recently concerning the newly announced "Resource Quota" feature that will be part of the upcoming Grid Engine 6.1 release. The specification document for the new Resource Quota facility makes specific mention of "dynamical limits". The specific example of a given "dynamical limit" is the following:

 limit hosts {@linux_hosts} to slots=$num_proc*5


... that limit would change from machine to machine depending on the number of CPUs resident in each machine. Useful.

Roland filed (and then fixed!) a new issue asking for this functionality to be extended to allow the following types of usage:

'slots=$num_proc*2-1' or slots=$num_proc*2+2'

The new enhancements extend the operators that can be used for defining these new limits. This enhancement also applies to load_formula syntax as well due to a shared codebase. The new syntax definition looks like this:

{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]]

It's official: 6.1 snapshot is out; major new enhancements

Posted by chris Wed, 13 Dec 2006 16:42:06 GMT

Highlights:

  • Preview release only, test carefully before even remotely considering production use
  • A tentative beta release of SGE 6.1 is scheduled for February 2007
  • No official data for full 6.1 release; official release may have additional features or components
  • A HUGE milestone with major new functionality
Read the official announcement here:

The most exciting new feature is a MAJOR step forward for the project and the product - a flexible system for implementing Resource Quotas. This feature is being developed to address at a minimum some of the biggest and most vexing configuration limitations encountered by the user community:

  • Issue #: 74: -- Supporting maxujobs on a per host level
  • Issue #: 1532: -- Allowing "max jobs per user" limits on a per queue basis
  • Issue #: 1644: -- Allowing per-user slot limits to be set within parallel environments (PE's).
Long time participants on the SGE mailing lists will recognize the above issues as some of the most commonly reported feature and enhancement requests rising out of the user community. The developers and project leads deserve sincere congratulations for pushing this enhancement through. The specification document looks well thought out and will likely be the foundation for future clever resource quota methods used by SGE Administrators and cluster operators.


Other additions to the 6.1 snapshot include:

  • Official support for Mac OS X on Intel and Linux on Itanium
  • ARCo joins the codebase (as reported previously)
  • The PDC patches supplied by the user community were accepted and now allow for better usage data collection on Apple Mac OS/X, IBM AIX and HP HP/UX
  • Helpful scripts and documentation for Solaris 10 users wishing to use the amazing DTrace tool for bottleneck identification and tuning

Advanced Reservation plugin for Grid Engine

Posted by chris Wed, 25 Oct 2006 21:54:22 GMT

Yoshio Tanaka posts the following:

... We are pleased to announce that advance-reservation plugin module
called PluS version 1.0.0 RC 1 is now available for download at the
PluS home page at:
  http://www.g-lambda.net/plus/ .

PluS (Plug-in Advance Reservation Manager for Torque and Grid Engine)
adds an advance-reservation function to Torque and Grid Engine.
For SGE, one of the following operations will be performed based on
the startup option.

(1) SGE queue base version
  - The SGE schedule is not replaced, and the reservation function is
    realized simply by managing the reservation queues.

(2) SGE self scheduling version
  - The original SGE scheduler is replaced by the PluS SGE scheduler
    which realizes the reservation management function and the job
    scheduling function.

...

The package is released under the Apache 2 License. It appears that the system has mainly been developed and tested on the following configuration: Linux 2.6.x, Intel x86, glibc 2.3.3, SGE 6.0u8

The HTML version of the PluS Manual is online here:
http://www.g-lambda.net/plus/wp-content/uploads/2006/10/manual.html.

The http://www.g-lambda.net/plus/ site contains a link to a PDF from a IEEE conference paper covering the system in more technical detail.

Resource Reservation vs Backfilling

Posted by chris Mon, 24 Jul 2006 20:53:00 GMT

A list message posted by Andreas back in June has a link to an overlooked yet quite interesting Grid Engine Design document. It includes the following definition of terms:

   Resource Reservation 
      A job-specific reservation created by the scheduler for pending 
      jobs. During the reservation the resources are blocked for lower 
      priority jobs.

   Backfilling
      The process of starting jobs of the job priority list despite of 
      higher priority pending jobs that might own a future reservation 
      with the same resource. Thus backfilling has a meaning only in the 
      context of Resource Reservation or Advance Reservation.

   Advance Reservation
      A reservation (possibly independent of a particular job) that can 
      be requested by a user or administrator and gets created by the 
      scheduler. The reservation causes the associated resources be blocked
      for other jobs.

   Preemption
      The process of interrupting job executions in order to free resources
      for particular jobs.

… good terms to know, especially when reading through the SGE docs and mailing list messages. The entire document makes for interesting reading.

Older posts: 1 2