Award Abstract # 1717950
SaTC: CORE: Small: Multi-Party High-dimensional Machine Learning with Privacy

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: RECTOR & VISITORS OF THE UNIVERSITY OF VIRGINIA
Initial Amendment Date: August 18, 2017
Latest Amendment Date: February 26, 2019
Award Number: 1717950
Award Instrument: Standard Grant
Program Manager: James Joshi
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2017
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $498,624.00
Total Awarded Amount to Date: $498,624.00
Funds Obligated to Date: FY 2017 = $498,624.00
History of Investigator:
  • David Evans (Principal Investigator)
    evans@virginia.edu
  • Quanquan Gu (Co-Principal Investigator)
  • Quanquan Gu (Former Principal Investigator)
  • David Evans (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Virginia Main Campus
1001 EMMET ST N
CHARLOTTESVILLE
VA  US  22903-4833
(434)924-4270
Sponsor Congressional District: 05
Primary Place of Performance: University of Virginia
P. O. Box 400195
Charlottesville
VA  US  22904-4195
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): JJG6HU8PA4S5
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 7434, 7923
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Individuals and organizations can frequently benefit from combining their data to learn collective models. However, combining data to enable multi-party learning is often not possible. It may not be permitted due to privacy policies, or may be considered too risky for a business to expose its own data to others. In addition, high-dimensional data are prevalent in modern data-driven applications. Learning from high-dimensional data owned by differential organizations is even more challenging, due to the bias introduced by the high-dimensional machine learning methods. The overarching goal of this project is to address these challenges by developing methods that enable a group of mutually distrusting parties to securely collaborate to apply high dimensional machine learning methods to produce a joint model without exposing their own data. This project enables owners of sensitive data to jointly learn models across their datasets without exposing that data and providing meaningful privacy guarantees. It produces open source software tools and has many important societal applications, including its use in analyzing electronic health records across multiple hospitals to identify medical correlations what could not be found by any individual hospital.


The key of multi-party high-dimensional machine learning is to find an efficient way to produce an accurate aggregate model that reflects all of the data, by combining local models that are developed independently based on individual data sets. The strategy of this project is to combine two emerging research directions: distributed machine learning, which seeks to distribute machine learning algorithms across hosts and produce an aggregate model by combining multiple local models; and secure multi-party computation, which enables a group of mutually distrusting parties to jointly compute a function without leaking information about their private inputs or any intermediate results. It also incorporates differential privacy-based mechanisms into multi-party high dimensional learning, which further protects the individual data points in each party. The results of this research have the potential to impact both the machine learning and security research communities. The education plan of this project includes developing open course materials that integrate privacy and machine learning, and provide research-based training opportunities for both undergraduate and graduate students in computer science, systems engineering, and medical informatics. It actively gets underrepresented groups involved in research projects, and trains a new generation of interdisciplinary researchers.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Bargav Jayaraman, Lingxiao Wang "Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization" Advances in neural information processing systems , 2018 Citation Details
Chen, Jinghui and Gu, Quanquan "RayS: A Ray Searching Method for Hard-label Adversarial Attack" ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2020 https://doi.org/ Citation Details
Evans, David and Kolesnikov, Vladimir and Rosulek, Mike "A Pragmatic Introduction to Secure Multi-Party Computation" Foundations and Trends® in Privacy and Security , v.2 , 2017 10.1561/3300000019 Citation Details
Jayaraman, Bargav and Evans, David "Evaluating Differentially Private Machine Learning in Practice" USENIX Security Symposium , 2019 Citation Details
Jayaraman, Bargav and Wang, Lingxiao and Knipmeyer, Katherine and Gu, Quanquan and Evans, David "Revisiting Membership Inference Under Realistic Assumptions" Proceedings on Privacy Enhancing Technologies , v.2021 , 2021 https://doi.org/10.2478/popets-2021-0031 Citation Details
Suri, Anshuman and Evans, David "Formalizing Distribution Inference Risks" Workshop on Theory and Practice of Differential Privacy , 2021 Citation Details
Wang, Bao and Gu, Quanquan and Boedihardjo, March and Wang, Lingxiao and Barekat, Farzin and Osher, Stanley J. "DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM" Mathematical and Scientific Machine Learning Conference , 2020 Citation Details
Wang, Lingxiao and Gu, Quanquan "A Knowledge Transfer Framework for Differentially Private Sparse Learning" Proceedings of the AAAI Conference on Artificial Intelligence , 2020 Citation Details
Wang, Lingxiao and Gu, Quanquan "Differentially Private Iterative Gradient Hard Thresholding for Sparse Learning" 28th International Joint Conference on Artificial Intelligence , 2019 Citation Details
Wang, Lingxiao and Jayaraman, Bargav and Evans, David and Gu, Quanquan "Efficient Privacy-Preserving Stochastic Nonconvex Optimization" International Conference on Uncertainty in Artificial Intelligence (UAI) , 2023 Citation Details
Zhang, Xiao and Chen, Jinghui and Gu, Quanquan and Evans, David "Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models" International Conference on Artificial Intelligence and Statistics , 2020 Citation Details
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

When machine learning is used to train models on sensitive data, there is a risk that the sensitive training data is exposed directly, especially when there is a need to train models on data from more than one data owner, and that the trained model which is then released reveals sensitive aspects of the training data. This project advances scientific understanding of privacy in machine learning settings where training is done on sensitive data owned by different organizations, and those organizations securely collaborate to jointly learn a model without exposing their own data. 

 

This project developed new methods for using cryptographic techniques to perform multi-party computation to enable joint models to be learned without needing to centralize the data. This project developed methods for incorporating noise directly within the secure computation to provide a formal privacy guarantee that bounds the risk that the learned joint model will reveal sensitive information about an individual?s training data. The results of the project enabled performance improvements in secure distributed learning, including in challenging non-convex learning settings, and resulted in multi-party learning algorithms that enable secure multi-party machine learning. The project also developed new empirical methods for analyzing the inference risks for a release model, providing better ways to estimate the risks of realistic inference attacks on release models and developing a new attack method that demonstrates that there are inference risks in settings where previous attacks would be ineffective. The tools developed for this project have been released as open source code and used by other researchers in academia and industry, and the scientific results of the project have been presented in research papers in top conferences in machine learning, security and privacy, and in keynote and invited talks at conferences and workshops, and to government policy-makers.


 

 


Last Modified: 11/22/2021
Modified by: David E Evans

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page