Award Abstract # 1817245
SaTC: CORE: Small: RUI: Differentially Private Hypothesis Testing

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE REED INSTITUTE
Initial Amendment Date: August 10, 2018
Latest Amendment Date: August 10, 2018
Award Number: 1817245
Award Instrument: Standard Grant
Program Manager: Anna Squicciarini
asquicci@nsf.gov
 (703)292-5177
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $344,683.00
Total Awarded Amount to Date: $344,683.00
Funds Obligated to Date: FY 2018 = $344,683.00
History of Investigator:
  • Adam Groce (Principal Investigator)
    agroce@reed.edu
  • Anna Ritz (Co-Principal Investigator)
  • Andrew Bray (Co-Principal Investigator)
Recipient Sponsored Research Office: Reed College
3203 SE WOODSTOCK BLVD
PORTLAND
OR  US  97202-8138
(503)771-1112
Sponsor Congressional District: 03
Primary Place of Performance: Reed College
Portland
OR  US  97202-8199
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): CMNJCKH6LTK6
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 7434, 7923
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In today's world, private companies, hospitals, governments, and other entities frequently maintain large databases that would be hugely valuable to researchers in many fields. However, privacy concerns prevent these databases from being fully utilized. Differential privacy defines conditions under which information about these databases can be released while provably protecting the privacy of the individuals whose data they contain. This project develops differentially private hypothesis tests. Hypothesis tests are common statistical tools that are widely used in data analysis for social sciences, medicine, and public policy. This project aims to give researchers the tools they need to conduct standard statistical analyses on data that was previously inaccessible due to privacy concerns. The project makes extensive use of undergraduate researchers, helping to train a new generation of privacy experts.

Differentially private hypothesis tests allow researchers to answer relevant research questions without having direct access to sensitive data. This project develops private versions of many standard hypothesis tests like analysis of variance or survival analysis. For each hypothesis test, there are three goals: (1) to develop a private test statistic, either by approximating the standard test statistic or by creating an entirely new test statistic, (2) to develop methods for converting this test statistic to a final, meaningful value such as a p-value, and (3) to conduct power analyses evaluating the utility of the hypothesis test. This project also aims to release a software package implementing these tests and explanatory materials to ease their adoption by non-specialists.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Couch, Simon and Kazan, Zeki and Shi, Kaiyan and Bray, Andrew and Groce, Adam "Differentially Private Nonparametric Hypothesis Testing" Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS) , 2019 10.1145/3319535.3339821 Citation Details
Groce, Adam and Rindal, Peter and Rosulek, Mike "Cheaper Private Set Intersection via Differentially Private Leakage" Proceedings on Privacy Enhancing Technologies , v.2019 , 2019 10.2478/popets-2019-0034 Citation Details
Swanberg, Marika and Globus-Harris, Ira and Griffith, Iris and Ritz, Anna and Groce, Adam and Bray, Andrew "Improved Differentially Private Analysis of Variance" Proceedings on Privacy Enhancing Technologies , v.2019 , 2019 10.2478/popets-2019-0049 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

If a statistical analysis satisfies differential privacy, then it can be carried out on a database without any concern that the outputs of the analysis might violate the privacy of the people whose data is contained in that database.  If we could create differentially private versions of standard statistical analyses, we would be able to carry out all sorts of important research (e.g., in medicine or social science) on data that would otherwise be inaccessible and without compromising anyone's privacy.

Many researchers are working on coming up with new differentially private ways to analyze data.  This project focused on hypothesis tests, extremely common statistical tools.  Even though these are some of the first types of analyses you learn about in an introductory statistics class, many had no existing private versions before this work.

We looked first at analysis of variance (ANOVA)-type tests, which are used when you want to see if a categorical variable is plausibly independent of a continuous variable.  The first private test for this situation was developed as preliinary work for this grant, and then under the grant we found additional tests that improved the statistical power (that is, how much data you need to carry out the test) by orders of magnitude, with the test now requiring an amount of data closer to what is required in the non-private setting than to what was required in the initial work.  The second major result was to find a way that any test could be automatically privatized in an easy way.  Along the way, there were other papers and preprints regarding related results.

These works were all published, and all incuded freely accessible code that would allow any analyst to carry out the methods we described.  We gave proofs of their privacy and empirical analyses of their statistical power.

This project focused on including undergraduates in the research.  Undergraduates were involved at every step of the process, from problem selection to writing up final papers.  They got excellent research experience, and several are already well on their way through successful graduate studies.

The project also brought together faculty and students from computer science and statistics, sharing expertise and helping each other learn about the norms of the other field.  This is necessary to make sure that differential privacy is held to standards of rigor that are no weaker/strong than the standard statistics it is attempting to substitute for.


Last Modified: 01/05/2024
Modified by: Adam Groce

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page