Ripple: Asynchronous Programming for Spatial Dataflow Architectures at the Extreme Edge
Souradip Ghosh, Carnegie Mellon University
Emerging sensing applications create an unprecedented need for extreme energy efficiency in general-purpose processors at the extreme edge (i.e., beyond traditional communication and power infrastructure). Multi-year deployments on a small battery are possible only if applications avoid off-device communication and process most data locally. Recent work has proposed energy-minimal dataflow architectures (EDMAs), an ISA and resource-constrained spatial dataflow machine, for the extreme edge. EDMA-based systems optimize, map, and run applications written in common embedded languages (e.g., C) entirely on-device, which eliminates Amdahl-like efficiency bottlenecks in prior systems while retaining application flexibility.
While EDMAs achieve extreme efficiency, they incur a cost to support general-purpose computation necessary for emerging domains and irregular applications. We observe a critical mismatch: EDMAs support programs written in a monolithic structure with sequential code, leaving the system to conservatively recover parallelism for an already parallel dataflow execution model. Control flow and ordering operations are excessively added to preserve sequential semantics for monolithic code during execution. This mismatch results in large code sizes, underutilized resources, frequent pipeline stalls, and low performance. Simplicity and conservatism in the language and execution model produce these bottlenecks.
We propose Ripple — an asynchronous parallel programming language that targets EDMA-like architectures. Instead of relying on sequential C and (often conservative) automatic parallelization, Ripple provides language-level primitives for asynchronous execution, work queues, and synchronization for shared memory access. Ripple's primitives are specifically designed to abstract existing processing and buffer resources in spatial dataflow architectures while enabling the user to effectively express coarse-grained parallelism. The system also provides a complete optimizing compiler and microarchitectural extensions to support Ripple on a new class of EDMAs.
Abstract Author(s): Souradip Ghosh, Nathan Serafin, Yufei, Shi, Nathan Beckmann, Brandon Lucia