Exploiting Data Structure Transformations for Performance and Scalability in High-Dimensional Structured Grid Computations
Benjamin Sepanski, University of Texas at Austin
Conventional multidimensional array layouts can incur substantial data movement costs coupled with inefficient MPI communication due to the complexity of accessing multiple low dimensional regions in a high-dimensional space. We explore the benefits of using a framework (Bricks) to optimize performance and communication separately from the functional representation by relying on data layout, auto-tuning, and code generation instead of traditional loop optimizations. To make Bricks viable in high-dimensional settings, we provide three novel contributions and evaluate them on several kernels taken from GENE, a phase-space fusion tokamak simulation code. First, we extend Bricks to support 6-dimensional arrays, kernels that operate on complex data types, and cuFFT. Second, we demonstrate how to optimize Bricks for data reuse and GPU hardware utilization achieving up to a 2.67x speedup on a single GPU. Finally, we show that Bricks can improve GENE’s performance by 2x in both strong-scaling performance and weak-scaling performance when running across a 128 A100 GPU system.
Abstract Author(s): Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams