DRMAA memory leak found & fixed

Posted by chris Tue, 19 Feb 2008 18:31:49 GMT

Most casual SGE users and admins probably find little cause to monitor the Grid Engine developer mailing list. A nice little success story has played out on the list recently with a user assisting the SGE dev team in quickly discovering, isolating and fixing a memory leak that has been in the codebase since the DRMAA 1.0 API release.

A user posted this message to the developer list, showing what appears to be a memory leak in in drmaa_run_job(). Andreas then replied asking if it was possible for the user to recreate the issue while running under the valgrind instrumentation framework.

In this follow-up thread, the user-provided valgrind data allowed Andreas to pinpoint the problem, file Issue #2497 with the bug tracking database and then post a preliminary patch that fixes the problem.

The patch still needs to undergo code review before it makes it officially into the Grid Engine codebase. Overall this is a nice little success story where a user was able to go the extra mile (by instrumenting under valgrind) in order to provide the developers exactly what they needed to quickly identify and fix things.

Kudos to James & Andreas.