Adding memory requirement awareness to the scheduler
In our SGE cluster, we have 2 nodes each of 4 CPU's and we are using "fill up host" scheduler configuration for job submission.
In this scheduler configuration, assume one parallel job (Job1) with 2 CPU's is running on nodeA and user submits another parallel job (Job2) of 2 CPU then SGE submit this job2 on nodeA.
Consider if the Job1 is utilizing higher memory on nodeA then job2 fails due to memory unavailability.
Is there a way to avoid this using SGE configuration?
As usual, Reuti comes through with a great answer:
... you will need to request the estimated amount of memory which the job might need. There are two ways to do it. Make:
a) h_vmem
or b) virtual_free
consumable in the complex definition (qconf -sc) and define a default comsumption there. Then attach a feasible value to each node (qconf - me
) for the installed memory. Use the one you defined in your qsub command by requesting it with the -l option (it's per slot, hence multiplied for parallel jobs unless you use special settings in the complex definition). The difference between the two ways is, that h_vmem will be enforced and kill the job when it needs one byte more, while b) is more a hint for SGE for the job distribution.
More background on Grid Engine and consumable resources is available at this Wiki doc link. That page concentrates on GUI based methods but also discusses the command-line methods that Reuti shows.

XML Feeds