Deep Learning With Differential Privacy and the Saddle Point
Juan Gomez, Harvard University
Machine learning (ML) techniques based on neural networks achieve remarkable accuracy in a wide variety of domains. In applications which utilize human-generated data, the datasets used to train these models often contain private/sensitive information. The careful design of ML models that do not expose private information is of vital importance in these domains. In this work, privacy is defined as the capacity of an adversary to seclude information about a specific person or group using the outputs of a model. Differential privacy is a mathematical framework that uses this definition, along with information-theoretic techniques, to quantify the greatest possible information gained by an adversary. It further provides strict bounds under which individual privacy is provably guaranteed.
Guaranteeing privacy when training ML models presents a unique challenge. Machine-learning algorithms often query the training dataset thousands of times, and each successive query necessarily leads to privacy loss. In this work, we propose using methods from mathematical physics, namely the method of steepest descent, to calculate the privacy loss. We call this approach the Saddle Point Accountant (SPA). We demonstrate, through numerical experiments, the precision of the SPA and how it can be used to augment existing methods. Finally, we demonstrate how this approach can be used to solve the inverse problem: constructing ML models that optimally balance privacy with accuracy.
Abstract Author(s): Wael Alghamdi, Juan Felipe Gomez, Shahab Asoodeh, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar