Award Abstract # 1228669
TWC SBES: Medium: Utility for Private Data Sharing in Social Science

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: August 20, 2012
Latest Amendment Date: September 10, 2013
Award Number: 1228669
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2012
End Date: August 31, 2018 (Estimated)
Total Intended Award Amount: $1,066,889.00
Total Awarded Amount to Date: $1,066,889.00
Funds Obligated to Date: FY 2012 = $1,066,889.00
History of Investigator:
  • Daniel Kifer (Principal Investigator)
  • Stephen Matthews (Co-Principal Investigator)
  • Tse-Chuan Yang (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
360F IST Bldg.
University Park
PA  US  16802-7000
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7434, 7924
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

One of the keys to scientific progress is the sharing of research data. When the data contain information about human subjects, the incentives not to share data are stronger. The biggest concern is privacy - specific information about individuals must be protected at all times. Recent advances in mathematical notions of privacy have raised the hope that the data can be properly sanitized and distributed to other research groups without revealing information about any individual. In order to make this effort worthwhile, the sanitized data must be useful for statistical analysis. This project addresses the research challenges in making the sanitized data useful. The first part of the project deals with the design of algorithms that produce useful sanitized data subject to privacy constraints. The second part of the project deals with the development of tools for the statistical analysis of sanitized data. Existing statistical routines are not designed for the types of complex noise patterns that are found in sanitized data; their naive use will often result in missed discoveries or false claims of statistical significance. The target application for this project is a social science dataset with geographic characteristics. The intellectual merit of this proposal is the development of a utility theory for algorithms that sanitize data and statistical tools for their analysis. The broader impact is the improved ability of research groups to share useful, but privacy-preserving, research data.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 15)
Bing-Rong Lin "Information Measures in Statistical Privacy and Data ProcessingApplications" Transactions on Knowledge Discovery from Data , 2015
Danfeng ZhangDaniel Kifer "LightDP: Towards Automating Differential Privacy Proofs" Principles of Programming Languages , 2017
Danfeng ZhangDaniel Kifer "LightDP: Towards Automating Differential Privacy Proofs" Principles of Programming Languages (POPL) , 2017
Ding, Zeyu and Wang, Yuxin and Wang, Guanhong and Zhang, Danfeng and Kifer, Daniel "Detecting Violations of Differential Privacy" The 2018 ACM SIGSAC Conference on Computer and Communications Security , 2018 10.1145/3243734.3243818 Citation Details
Hongjian WangXianfeng TangYu-Hsuan KuoDaniel KiferZhenhui Li "A Simple Baseline for Travel Time Estimation using Large-scale Trip Data" ACM Transactions on Intelligent Systems and Technology (TIST) , 2019
Jaewoo LeeDaniel Kifer "Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget" KDD , 2018
Lee, Jaewoo and Kifer, Daniel "Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget" Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2018 10.1145/3219819.3220076 Citation Details
Ryan Rogers "A New Class of Private Chi-Square Hypothesis Tests" AISTATS , 2017
Ryan RogersDaniel Kifer "A New Class of Private Chi-Square Hypothesis Tests" AISTATS , 2017
Wang, Hongjian and Tang, Xianfeng and Kuo, Yu-Hsuan and Kifer, Daniel and Li, Zhenhui "A Simple Baseline for Travel Time Estimation using Large-scale Trip Data" ACM Transactions on Intelligent Systems and Technology , v.10 , 2019 10.1145/3293317 Citation Details
Wang, Yue and Kifer, Daniel and Lee, Jaewoo "Differentially Private Confidence Intervals for Empirical Risk Minimization" Journal of Privacy and Confidentiality , v.9 , 2019 10.29012/jpc.660 Citation Details
(Showing: 1 - 10 of 15)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Differential privacy is a set of mathematical guidelines for the behavior of software. Software that follows those guidelines (called "differentially private software") obtains a provable mathematical guarantee on the privacy of individuals who contributed data that was input into the software. Differential privacy requires that the software use randomness is a special way that masks the contributions of data of any individual.

Because of this randomness, specialized postprocessing algorithms are needed to extract the most utility from the output of differentially private software. The goal of this project was to develop such post-processing algorithms.

The first part of the project was concerned with mathematical definitions of utility (i.e., "information content") in the output of differentially private software and the mathematical properties that utility measures need to satisfy (for example, any computation performed on the output should have measured utiltiy that is less than the output itself). Utility measures are needed for the design of differentially private software (so that the software can provide the most useful output subject to privacy constraints).

Later, the project considered the design of post-processing algorithms that would unlock the information contained in the output of differentially private software. For instance, given the noisy output, one would be interested in approximately answering questions about the input. The most natural use case is statistical analysis.

In classical statistics, one is interested in performing hypothesis tests on the data (e.g., did the data plausibly come from a specific distribution or are two attributes independent of each other), building models, and obtaining confidence intervals for parameters in the models. We designed new statistical techniques for taking the output of differentially private software and, using only this output, to perform hypothesis tests designed to answer questions about the true data. We also designed algorithms for obtaining confidence intervals for a class of statistical models that are trained using a framework known as empirical risk minimization.

In terms of broader impact, we have developed techniques for statistical analysis over privacy preserving data that can be used by statisticians and social scientists in their applications so that they can study their data while providing strong confidentiality guarantees. We also developed software that allows construction of statistical models that use geography information while protecting the locations of individuals whose data were used to construct these models.


Last Modified: 03/06/2019
Modified by: Daniel Kifer

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page