Increasing vector widths and many-core architectures introduce significant challenges to achieving efficient compute resource utilization on next-generation supercomputers. High Performance ParallelX (HPX) is an asynchronous runtime specifically designed to address the bottlenecks associated with the massive concurrency of these upcoming systems. We present a comparison of a traditional MPI+OpenMP vs. an HPX implementation of a discontinuous Galerkin kernel solving the acoustic wave equation. In order to achieve good vectorization of the discontinuous Galerkin kernels, we will use Vc, a portable library of SIMD vector classes for C++. Scaling results will be presented on the Intel Knights Landing chips on Stampede2. We intend to present performance results highlighting the benefits of asynchronous task execution versus a static execution model.
Performance Comparison of HPX vs. MPI+OpenMP for the Discontinuous Galerkin Finite Element Method on Knights Landing Chips
Presenter:
Maximilian
Bremer
Profile Link:
University:
University of Texas
Program:
CSGF
Year:
2017