Adapting to Memory Pressure from within Scientific Applications on Multiprogrammed Clusters of Workstations
Richard Mills, College of William and Mary
Dismal performance often results when the memory requirements of a process exceed the physical memory available to it. Furthermore, significant throughput reduction occurs when the process is part of a synchronous parallel job on a non-dedicated computational cluster. A possible solution is to develop programs that can dynamically adapt their memory usage according to the current availability of physial memory. We explore this idea in the context of scientific computations that perform repetetive data accesses. Part of the program’s data set is cached in resident memory, while the remainder that cannot fit is accessed in an out-of-core fashion from disk. The replacement policy is user-defined and application-specific. This allows graceful degradation of performance as memory becomes scarce. To dynamically adjust its memory usage, the program must reliably determine whether there is memory shortage or surplus in the system. Because operating systems typically export limited memory information, we develop a parameter-free algorithm that uses no system information beyond the resident set size (RSS) of the program. The resulting library can be called by scientific codes with little change to their structure, or possible no change at all if computations are already “blocked” for reasons of memory locality.
Experimental results with both sequential and parallel versions of a memory-adaptive conjugate-gradient linear system solver show substantial performance gains over conventional, in-core codes that rely on the virtual memory system. Furthermore, multiple instances of the adaptive code can coexist on the same node with little interference with one another.
Abstract Author(s): Richard Tran Mills