Conventional multidimensional array layouts can incur substantial data movement costs coupled with inefficient MPI communication due to the complexity of accessing multiple low dimensional regions in a high-dimensional space. We explore the benefits of using a framework (Bricks) to optimize performance and communication separately from the functional representation by relying on data layout, auto-tuning, and code generation instead of traditional loop optimizations.
To make Bricks viable in high-dimensional settings, we provide three novel contributions and evaluate them on several kernels taken from GENE, a phase-space fusion tokamak simulation code. First, we extend Bricks to support 6-dimensional arrays, kernels that operate on complex data types, and cuFFT. Second, we demonstrate how to optimize Bricks for data reuse and GPU hardware utilization achieving up to a 2.67x speedup on a single GPU. Finally, we show that Bricks can improve GENE’s performance by 2x in both strong-scaling performance and weak-scaling performance when running across a 128 A100 GPU system.
Exploiting Data Structure Transformations for Performance and Scalability in High-Dimensional Structured Grid Computations
Presenter:
Benjamin
Sepanski
Profile Link:
University:
University of Texas at Austin
Program:
CSGF
Year:
2022