Communication-Optimal QR Factorization: Performance and Scalability on Varying Architectures
Edward Hutter, University of Illinois at Urbana-Champaign
Parallel numerical linear algebra libraries are a key component of the software stack on HPC systems. We are pursuing development of novel communication-avoiding algorithms and libraries that encapsulate them. This poster will focus on advances in parallel QR factorization, one of the most widely used matrix factorizations and a critical component of linear least-squares and eigenvalue problems. Our new communication-avoiding CholeskyQR2 is the first practical algorithm that is communication optimal given any amount of memory and matrix dimensions. The algorithm requires slightly more computation but achieves a much higher arithmetic intensity, making it especially promising on emerging HPC architectures. We demonstrate its benefit via strong and weak scaling studies on Blue Waters and Stampede2, achieving speed-ups of up to 3.3 times when using 65,000 processors. We also study its extensibility as a key component of a communication-optimal eigensolver.
Abstract Author(s): Edward Hutter, Edgar Solomonik