6.2 Bringing Significant Improvements to Cluster Queue Matching

Posted by chris Wed, 16 Apr 2008 11:21:46 GMT

Interesting writeup from Andreas Hass reproduced in full below ...


I thought this could be of interest for those who care for dispatching times. This maintrunk check-in

http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&msgNo=9814

will improve the matching times for set-ups where queue resource limits such as -l h_rt or -l h_vmem are criterion whether a job gets into a queue or not.

Before the above change we had an exponential growth of dispatching times

04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: job dispatching took 0.030 s (20 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:48:23|schedu|es-ergb01-01|P|PROF: job dispatching took 0.130 s (40 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:48:50|schedu|es-ergb01-01|P|PROF: job dispatching took 0.630 s (80 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:49:26|schedu|es-ergb01-01|P|PROF: job dispatching took 3.210 s (160 fast, 0 comp, 0 pe, 0 res)

now growth is linear

04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: job dispatching took 0.000 s (20 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:54:17|schedu|es-ergb01-01|P|PROF: job dispatching took 0.020 s (40 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:54:44|schedu|es-ergb01-01|P|PROF: job dispatching took 0.050 s (80 fast, 0 comp, 0 pe, 0 res)
04/15/2008 11:55:16|schedu|es-ergb01-01|P|PROF: job dispatching took 0.070 s (160 fast, 0 comp, 0 pe, 0 res)

also note this maintrunk check-in

http://gridengine.sunsource.net/servlets/ReadMsg?list=cvs&msgNo=9713

when profiling is enabled in sched_conf(5) like this

   :
  params  PROFILE=true
   :

the actual cause for exponential/linear dispatching times becomes fairly obvious: Without the above improvement the scheduler did check each single queue instance also in cases when the entire cluster queue was not suited

04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: sequential matching global          rqs     cqstatic      hstatic      qstatic     hdynamic qdyn
04/15/2008 11:48:01|schedu|es-ergb01-01|P|PROF: sequential matching 20            0           30          200          210          210 65
04/15/2008 11:48:23|schedu|es-ergb01-01|P|PROF: sequential matching 40            0           60          800          820          820 230
04/15/2008 11:48:50|schedu|es-ergb01-01|P|PROF: sequential matching 80            0          120         3200         3240         3240 860
04/15/2008 11:49:26|schedu|es-ergb01-01|P|PROF: sequential matching 160            0          240        12800        12880        12880 3320

now the queue instances checked is done only if needed

04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: sequential matching global          rqs     cqstatic      hstatic      qstatic     hdynamic qdyn
04/15/2008 11:53:54|schedu|es-ergb01-01|P|PROF: sequential matching 20            0           30           10           10           10 10
04/15/2008 11:54:17|schedu|es-ergb01-01|P|PROF: sequential matching 40            0           60           20           20           20 20
04/15/2008 11:54:44|schedu|es-ergb01-01|P|PROF: sequential matching 80            0          120           40           40           40 40
04/15/2008 11:55:16|schedu|es-ergb01-01|P|PROF: sequential matching 160            0          240           80           80           80 80

the setup I used to get these numbers was two cluster queues over 10,20,40,80 simulated hosts (using SIMULATE_EXECDS=true in sge_sconf(5) qmaster_params).

One cluster queue was not eligible at all for any of the jobs due to a queue resource limit, but I forced scheduler to evaluate this cluster queue first with sequence numbers as to provoke the worst case. In addition each job requested slightly different resource amount

  -l h_rt=300
  -l h_rt=301
  -l h_rt=302
     :
  -l h_rt=379

as to sabotage the reuse of dispatching results for identical jobs within a scheduling interval.

Regards, Andreas