Efficient Algorithms for Watershed Analysis
Richard Barnes, University of California, Berkeley
The natural watersheds of the United States, comprised of lakes, ponds and rivers, are interwoven with artificial watersheds of tiles and ditches that drain fields of excess water. This water carries fertilizers, pesticides and other chemicals into natural watersheds with downstream repercussions. Wetlands and vegetative buffers provide ecosystem services by filtering and absorbing such chemicals. With limited budgets available for remediation, such buffers must be sited for maximal efficacy. This and other policy recommendation objectives require high-resolution modeling of hydrology at the state level, typically using publicly funded LiDAR-derived digital elevation models (DEMs) as a starting point.
Large data sets, which often cannot fit within a computer's RAM, challenge algorithms that extract hydrologic features and properties from DEMs. I present new parallel algorithms which scale linearly by subdividing DEMs into tiles. Using a single-producer, multi-consumer design, the new algorithms work equally well on one core, multiple cores or multiple machines and can take advantage of large memories or cope with small ones. Unlike previous algorithms, the new algorithms guarantee a fixed number of memory access and communication events per subdivision of the DEM. For moderately sized tiles, the algorithms exhibit about 60 percent strong and weak scaling efficiencies up to 48 cores. The largest data set on which I run the algorithms has 2 trillion (10^12) cells. With 48 cores, processing required 4.8 hours of wall-time (9.3 compute-days). These tests are three orders of magnitude larger than any previously performed in the literature. Complete, well-commented source code and correctness tests are available for download from a repository.
Abstract Author(s): R. Barnes