Scaling the Nearest Neighbor Gaussian Process
Ashlynn Crisp, Portland State University
Gaussian processes are ubiquitous as the primary tool for modeling spatial data. However, the Gaussian process is limited by its O(n3) cost, making direct parameter fitting algorithms infeasible for the scale of modern data collection initiatives. The Nearest Neighbor Gaussian Process (NNGP) was proposed as a scalable approximation to dense Gaussian processes which has been successful for ~ 106 observations. We introduce the clustered Nearest Neighbor Gaussian Process (cNNGP) which further reduces the computational and storage cost of a stationary NNGP. Our simulations demonstrate significant reductions in the computational time for model fitting while preserving model fit quality. Furthermore, relative to the full NNGP, the reduction in computational requirements improves as the data size grows. The proposed methods were implemented to obtain an uncertainty-equipped, wall-to-wall map for predicted biomass using biomass estimates generated by the Global Ecosystem Dynamics Investigation (GEDI).