
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 18, 2017 |
Latest Amendment Date: | February 26, 2019 |
Award Number: | 1717950 |
Award Instrument: | Standard Grant |
Program Manager: |
James Joshi
CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2017 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $498,624.00 |
Total Awarded Amount to Date: | $498,624.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1001 EMMET ST N CHARLOTTESVILLE VA US 22903-4833 (434)924-4270 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
P. O. Box 400195 Charlottesville VA US 22904-4195 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Secure &Trustworthy Cyberspace |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Individuals and organizations can frequently benefit from combining their data to learn collective models. However, combining data to enable multi-party learning is often not possible. It may not be permitted due to privacy policies, or may be considered too risky for a business to expose its own data to others. In addition, high-dimensional data are prevalent in modern data-driven applications. Learning from high-dimensional data owned by differential organizations is even more challenging, due to the bias introduced by the high-dimensional machine learning methods. The overarching goal of this project is to address these challenges by developing methods that enable a group of mutually distrusting parties to securely collaborate to apply high dimensional machine learning methods to produce a joint model without exposing their own data. This project enables owners of sensitive data to jointly learn models across their datasets without exposing that data and providing meaningful privacy guarantees. It produces open source software tools and has many important societal applications, including its use in analyzing electronic health records across multiple hospitals to identify medical correlations what could not be found by any individual hospital.
The key of multi-party high-dimensional machine learning is to find an efficient way to produce an accurate aggregate model that reflects all of the data, by combining local models that are developed independently based on individual data sets. The strategy of this project is to combine two emerging research directions: distributed machine learning, which seeks to distribute machine learning algorithms across hosts and produce an aggregate model by combining multiple local models; and secure multi-party computation, which enables a group of mutually distrusting parties to jointly compute a function without leaking information about their private inputs or any intermediate results. It also incorporates differential privacy-based mechanisms into multi-party high dimensional learning, which further protects the individual data points in each party. The results of this research have the potential to impact both the machine learning and security research communities. The education plan of this project includes developing open course materials that integrate privacy and machine learning, and provide research-based training opportunities for both undergraduate and graduate students in computer science, systems engineering, and medical informatics. It actively gets underrepresented groups involved in research projects, and trains a new generation of interdisciplinary researchers.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
When machine learning is used to train models on sensitive data, there is a risk that the sensitive training data is exposed directly, especially when there is a need to train models on data from more than one data owner, and that the trained model which is then released reveals sensitive aspects of the training data. This project advances scientific understanding of privacy in machine learning settings where training is done on sensitive data owned by different organizations, and those organizations securely collaborate to jointly learn a model without exposing their own data.
This project developed new methods for using cryptographic techniques to perform multi-party computation to enable joint models to be learned without needing to centralize the data. This project developed methods for incorporating noise directly within the secure computation to provide a formal privacy guarantee that bounds the risk that the learned joint model will reveal sensitive information about an individual?s training data. The results of the project enabled performance improvements in secure distributed learning, including in challenging non-convex learning settings, and resulted in multi-party learning algorithms that enable secure multi-party machine learning. The project also developed new empirical methods for analyzing the inference risks for a release model, providing better ways to estimate the risks of realistic inference attacks on release models and developing a new attack method that demonstrates that there are inference risks in settings where previous attacks would be ineffective. The tools developed for this project have been released as open source code and used by other researchers in academia and industry, and the scientific results of the project have been presented in research papers in top conferences in machine learning, security and privacy, and in keynote and invited talks at conferences and workshops, and to government policy-makers.
Last Modified: 11/22/2021
Modified by: David E Evans
Please report errors in award information by writing to: awardsearch@nsf.gov.