Improving the Parallel Efficiency of Tau3P, a Parallel Electromagnetic Field Solver
Michael Wolf, University of Illinois
At the Stanford Linear Accelerator Center, I developed a parallel time domain electromagnetic solver (Tau3P) that has been successful in modeling large accelerator structures. However, the parallel efficiency I obtained for Tau3P was very poor for large numbers of processors. One difficulty was that Tau3P uses a discrete surface integral (DSI) method on unstructured meshes with both orthogonal and nonorthogonal elements. In this formulation, partitions consisting of nonorthogonal elements have many more nonzeros than partitions with the same number of strictly orthogonal elements. This complicates loadbalancing. Another difficulty is the extreme sparsity of the matrices. Due to this sparsity, each processor must contain many rows to mitigate the effects of communication. However, agglomerating too many rows per processor will reduce potential parallelism by reducing the number of processors used in a simulation. If communication for this problem could be reduced, it would become less difficult to obtain high parallel efficiency for large numbers of processors.
I have researched mesh partitioning techniques in an attempt to reduce this communication overhead and improve parallel efficiency in Tau3P. From these techniques, I greatly improved parallel efficiency in Tau3P and have learned what partitioning techniques work well for these types of problems. More recently, I have focused on the communication patterns in Tau3P in order to reduce communication overhead and increase parallel performance. In particular, I focused on improving the parallel scaled matrix/vector multiplication with vector addition algorithm, which is the primary numerical algorithm in Tau3P. I have examined many communication schemes and algorithmic orderings to find an implementation that best minimizes communication and yields the greatest parallel efficiency.
Abstract Author(s): Michael Wolf