Wildcard PEs for threaded app optimization on multicore systems

Posted by chris Fri, 02 Nov 2007 15:49:29 GMT

This is from an old mailing list thread I had kept flagged in my inbox ...

In this interesting mailing list thread from back in August, John Coldrick is looking for advice on how to maximize the power of his render farm.

John has a threaded (non-parallel) application that must run within a single execution host, some hosts having up to 8 cores available for jobs. The application's thread usage can be dialed up or down depending on how many CPU cores are available. What John is basically trying to do is:

  1. Sort available hosts to find the one with the most CPU cores available
  2. Reserve or otherwise tell the SGE scheduler that those CPU cores are all going to be used by a single application
  3. Tell the application itself how many cores it has been granted so that it can dial it's own thread usage up or down appropriately

The solution suggested by Dan combines some old admin magic from the SGE 5.x days (using PEs as a nifty hack to lock out multiple job slots in use by threaded non-parallel applications) with some newer SGE 6.x features (using wildcard '*' selectors when making a request for a parallel environment) to arrive at a nifty solution.

After creating a PE on each of his execution hosts, John can submit his render job requesting a range of CPU slots ([1-8] in his case) while also using a wildcard selector to ask for any parallel environment. The end result is that:

  • The SGE scheduler will find the system with the most available slots/cores automatically
  • Within the parallel environment SGE understands the job will consume more than 1 job slot
  • John's application script can just query the environment variable $NSLOTS to learn how many CPUs it was granted and then adjust it's thread usage accordingly

Related post:
"Grouping jobs to nodes via wildcard PE's"