Award Abstract # 1253942
CAREER: Differentially-Private Machine Learning with Applications to Biomedical Informatics

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA, SAN DIEGO
Initial Amendment Date: February 6, 2013
Latest Amendment Date: May 30, 2017
Award Number: 1253942
Award Instrument: Continuing Grant
Program Manager: Rebecca Hwa
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2013
End Date: June 30, 2019 (Estimated)
Total Intended Award Amount: $490,625.00
Total Awarded Amount to Date: $490,625.00
Funds Obligated to Date: FY 2013 = $98,670.00
FY 2014 = $103,184.00

FY 2015 = $107,082.00

FY 2016 = $112,254.00

FY 2017 = $69,435.00
History of Investigator:
  • Kamalika Chaudhuri (Principal Investigator)
    kamalika@cs.ucsd.edu
Recipient Sponsored Research Office: University of California-San Diego
9500 GILMAN DR
LA JOLLA
CA  US  92093-0021
(858)534-4896
Sponsor Congressional District: 50
Primary Place of Performance: University of California-San Diego
9500 Gilman Drive
San Diego
CA  US  92093-0404
Primary Place of Performance
Congressional District:
50
Unique Entity Identifier (UEI): UYTTZT6G9DT1
Parent UEI:
NSF Program(s): Robust Intelligence,
Secure &Trustworthy Cyberspace
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT

01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7434
Program Element Code(s): 749500, 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Machine learning on large-scale patient medical records can lead to the discovery of novel population-wide patterns enabling advances in genetics, disease mechanisms, drug discovery, healthcare policy, and public health. However, concerns over patient privacy prevent biomedical researchers from running their algorithms on large volumes of patient data, creating a barrier to important new discoveries through machine-learning.

The goal of this project is to address this barrier by developing privacy-preserving tools to query, cluster, classify and analyze medical databases. In particular, the project aims to ensure differential privacy --- a formal mathematical notion of privacy designed by cryptographers which has gained considerable attention in the systems, algorithms, machine-learning and data-mining communities in recent years. The primary challenge in applying differentially-private machine learning tools to biomedical informatics is the lack of statistical efficiency, or the large number of samples required.

The project will overcome this challenge by drawing on insights obtained from the PI's expertise to develop differentially-private and highly statistically-efficient machine learning tools for classification and clustering. The proposed research will advance the state-of-the-art in privacy-preserving data analysis by combining insights from differential privacy, statistics, machine learning, and database algorithms.

The proposed research is closely tied to the development of the undergraduate and graduate curricula at UCSD, feeding into the PI's new undergraduate machine learning class, a new graduate learning theory class, and updates to an algorithm design and analysis class. The corresponding materials will be publicly disseminated through the PI's website. The PI is strongly committed to increasing the participation of women and minorities, and will engage in outreach activities to attract and retain women in computer science.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jimmy Foulds, Joseph Geumlek, Max Welling and Kamalika Chaudhuri "On the Theory and Practice of Privacy-preserving Bayesian Data Analysis" Uncertainty in Artificial Intelligence (UAI) , 2016
Joseph Geumlek, Shuang Song and Kamalika Chaudhuri "Renyi Differential Privacy Mechanisms for Posterior Sampling" Neural Information Processing Systems (NIPS) , 2017
Mijung Park, James Foulds, Kamalika Chaudhuri and Max Welling "DP-EM: Differentially Private Expectation Maximization" International Conference on Artificial Intelligence and Statistics (AISTATS) , 2017
Shuang Song, Yizhen Wang and Kamalika Chaudhuri "Pufferfish Privacy Mechanisms for Correlated Data" ACM SIGMOD International Conference on Management of Data (SIGMOD) , 2017
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeff Naughton "Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics" ACM SIGMOD International Conference on Management of Data (SIGMOD) , 2017
Yizhen Wang, Somesh Jha and Kamalika Chaudhuri "Analyzing the Robustness of Nearest Neighbors to Adversarial Examples" International Conference on Machine Learning (ICML) , 2018

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A great deal of data analysis is currently carried out on sensitive data, such as medical information, financial records and search logs, and it has been shown that ,odels trained on this kind of data have the potential of leaking sensitive information about the people in the training set. The goal of this project is design, analyze and implement of novel machine learning algorithms that do not leak such information. Specifically, we aim to design algorithms that enforce a rigorous guarantee of differential privacy -- that ensures that the participation of a single person in the dataset does not change the probability of any outcome by much. 

The project has led to the development of private algorithms for a large number of tasks. This includes generic private algorithmic techniques, and private machine learning algorithms for classification, stochastic optimization and Bayesian inference. Additionally, we have branched out beyond differential privacy, and designed mechanisms that offer privacy on correlated data -- such as those encountered in location privacy applications.

The project has also partially supported three graduate students including a woman graduate student; one of the students has already completed her PhD, and the other two are on their way to graduation. Support from the project has enabled the students to attend conferences and give talks on their work. Finally, in addition to multiple invited talks, support from the project has enabled the PI to give two invited tutorials on privacy at major signal processing and machine learning venues. 

 


Last Modified: 08/29/2019
Modified by: Kamalika Chaudhuri

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page