Award Abstract # 1422278
CIF: Small: Statistical Data Privacy: Fundamental Limits and Efficient Algorithms

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF ILLINOIS
Initial Amendment Date: July 24, 2014
Latest Amendment Date: July 24, 2014
Award Number: 1422278
Award Instrument: Standard Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2014
End Date: July 31, 2018 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2014 = $500,000.00
History of Investigator:
  • Pramod Viswanath (Principal Investigator)
    pramodv@princeton.edu
Recipient Sponsored Research Office: University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA
IL  US  61801-3620
(217)333-2187
Sponsor Congressional District: 13
Primary Place of Performance: University of Illinois at Urbana-Champaign
IL  US  61820-7473
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): Y8CWNJRCNN91
Parent UEI: V2PHZ2CSCH63
NSF Program(s): Comm & Information Foundations,
Secure &Trustworthy Cyberspace
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7935, 7923, 7434
Program Element Code(s): 779700, 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Privacy is a fundamental individual right. In the era of big data, large amounts of data about individuals are collected both voluntarily (e.g., frequent flier/shopper incentives) and involuntarily (e.g. US Census or medical records). With the ready ability to search for information and correlate it across distinct sources (using data analytics and/or recommender systems), privacy violation takes on an ominous note in this information age.

Anonymization of user information is a classical technique, but is susceptible to correlation attacks: by correlating the anonymized database with another (perhaps publicly available) deanonymized database, a user's privacy could still be divulged. A way out of the limitations of anonymization is to release a randomized database; this offers plausible deniability of any user identity breached via the data release. A systematic way of providing guarantees for the deniability of user presence/absence is the technical field of differential privacy, providing strong privacy guarantees against adversaries with arbitrary side information. It is of fundamental interest to characterize privacy mechanisms that randomize "just enough" to keep the released database as true to the intended one as possible, providing maximal utility.

Based on recent work connecting the areas of information theory and statistical data privacy (via a hypothesis testing context) and demonstrating novel privacy mechanisms that exponentially improve (in terms of variance of noise added, say) upon the state of the art for medium and low privacy regimes, the objective of the project is threefold: (a) characterize the fundamental limits to tradeoffs between privacy and utility in a variety of canonical setting; (b) discover (near) optimal mechanisms that can be efficiently implemented in practice; and (c) seek natural notions of statistical data privacy (beyond differential privacy) using the operational context of hypothesis testing.


Privacy is a central, and multifaceted, social and technological issue of today's information age. This project is focused on the technical aspect of this multifaceted area, and seeks to discover fundamental limits to privacy-utility tradeoffs in the context of currently well established notions of privacy (differential privacy). The expected results expected are fundamental and immediately applicable to a variety of practical settings. Specifically, two concrete practical settings involving genomic data release and smart meter data release will be studied in detail. Due to privacy concerns, genomic and smart meter data is simply unavailable at large -- depriving widespread data analytics and practical implications of such analysis. This project will build and release a software suite of sanitization tools, involving the privacy mechanisms discovered as part of this project.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)
G. Fanti, P. Kairouz, S. Oh and P. Viswanath "Spy vs Spy: Rumor Source Obfuscation" ACM Sigmetrics , 2015
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Hiding the Rumor Source" IEEE Transactions on Information Theory , 2017
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Metadata-conscious anonymous messaging" ICML , 2016
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Metadata-conscious Anonymous Messaging" International Conference on Machine Learning , 2016
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Meta-Data Conscious Anonymous Messaging" Journal on Special Topics in Signal Processing , 2016
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Rumor Source Obfuscation on Irregular Trees" ACM Sigmetrics , 2016
G. Fanti, P. Kairouz, S. Oh, K. Ramchandran and P. Viswanath "Rumor Source Obfuscation on Irregular Trees" ACM Sigmetrics , 2017
Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath "Hiding the Rumor Source" IEEE Transactions on Information Theory , v.63 , 2017 , p.6679
P. Kairouz and S. Oh and P. Viswanath "The Composition Theorem for Differential Privacy" IEEE Transactions on information theory , 2015
P. Kairouz, S. Oh and P. Viswanath "Extremal mechanisms for local differential privacy" Journal of Machine Learning Research , 2016
P. Kairouz, S. Oh and P. Viswanath "Secure Multiparty Differential Privacy" NIPS , 2015
(Showing: 1 - 10 of 14)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Privacy is a fundamental right. In the era of big data and social networking, privacy of individuals and their data has rapidly eroded. In this project we take a first principles view of data privacy and how to formally guarantee it under statistical (i.e., plausibility) contexts. Differential privacy is a leading proposal for a formal characterization of statistical data privacy.  In this project we enlarge the scope of differential privacy by deriving an equivalent operational hypothesis testing interpretation. This allows us to establish exact optimality results both for canonical differential privacy settings but also an enlarged class. In the first result, we derive the exact noise distribution that achieves optimal utlity (expecxted value of any monotonically increasing function of the noise): the optimal noise has a  staircase distribution with parameters that depend on the function being queried and the privacy required. In the context of  binary data being exposed potentially via a real valued description, we show that the optimal privatization mechanism is simply the randomized response (where the output is also simply binary). This optimality is shown in a very general setting, utilizing the hypothesis testing framework we have established.  In the second result, we show precise characterizations of the privacy leakage due to additional querying (composition of multiple data releases), for every finite number of queries and exact values of the differential privacy parameters. Finally, we show that randomized response is optimal in any  multiparty setting where the individual users all have binary data and are computing some boolean function of each other's data with differential privacy constraints. 


Last Modified: 08/06/2018
Modified by: Pramod Viswanath

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page