
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 20, 2012 |
Latest Amendment Date: | September 10, 2013 |
Award Number: | 1228669 |
Award Instrument: | Standard Grant |
Program Manager: |
Sara Kiesler
skiesler@nsf.gov (703)292-8643 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2012 |
End Date: | August 31, 2018 (Estimated) |
Total Intended Award Amount: | $1,066,889.00 |
Total Awarded Amount to Date: | $1,066,889.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 OLD MAIN UNIVERSITY PARK PA US 16802-1503 (814)865-1372 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
360F IST Bldg. University Park PA US 16802-7000 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Secure &Trustworthy Cyberspace |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
One of the keys to scientific progress is the sharing of research data. When the data contain information about human subjects, the incentives not to share data are stronger. The biggest concern is privacy - specific information about individuals must be protected at all times. Recent advances in mathematical notions of privacy have raised the hope that the data can be properly sanitized and distributed to other research groups without revealing information about any individual. In order to make this effort worthwhile, the sanitized data must be useful for statistical analysis. This project addresses the research challenges in making the sanitized data useful. The first part of the project deals with the design of algorithms that produce useful sanitized data subject to privacy constraints. The second part of the project deals with the development of tools for the statistical analysis of sanitized data. Existing statistical routines are not designed for the types of complex noise patterns that are found in sanitized data; their naive use will often result in missed discoveries or false claims of statistical significance. The target application for this project is a social science dataset with geographic characteristics. The intellectual merit of this proposal is the development of a utility theory for algorithms that sanitize data and statistical tools for their analysis. The broader impact is the improved ability of research groups to share useful, but privacy-preserving, research data.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Differential privacy is a set of mathematical guidelines for the behavior of software. Software that follows those guidelines (called "differentially private software") obtains a provable mathematical guarantee on the privacy of individuals who contributed data that was input into the software. Differential privacy requires that the software use randomness is a special way that masks the contributions of data of any individual.
Because of this randomness, specialized postprocessing algorithms are needed to extract the most utility from the output of differentially private software. The goal of this project was to develop such post-processing algorithms.
The first part of the project was concerned with mathematical definitions of utility (i.e., "information content") in the output of differentially private software and the mathematical properties that utility measures need to satisfy (for example, any computation performed on the output should have measured utiltiy that is less than the output itself). Utility measures are needed for the design of differentially private software (so that the software can provide the most useful output subject to privacy constraints).
Later, the project considered the design of post-processing algorithms that would unlock the information contained in the output of differentially private software. For instance, given the noisy output, one would be interested in approximately answering questions about the input. The most natural use case is statistical analysis.
In classical statistics, one is interested in performing hypothesis tests on the data (e.g., did the data plausibly come from a specific distribution or are two attributes independent of each other), building models, and obtaining confidence intervals for parameters in the models. We designed new statistical techniques for taking the output of differentially private software and, using only this output, to perform hypothesis tests designed to answer questions about the true data. We also designed algorithms for obtaining confidence intervals for a class of statistical models that are trained using a framework known as empirical risk minimization.
In terms of broader impact, we have developed techniques for statistical analysis over privacy preserving data that can be used by statisticians and social scientists in their applications so that they can study their data while providing strong confidentiality guarantees. We also developed software that allows construction of statistical models that use geography information while protecting the locations of individuals whose data were used to construct these models.
Last Modified: 03/06/2019
Modified by: Daniel Kifer
Please report errors in award information by writing to: awardsearch@nsf.gov.