Award Abstract # 1408874
TWC: Medium: Collaborative: Broker Leads for Privacy-Preserving Discovery in Health Information Exchange

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date: August 26, 2014
Latest Amendment Date: August 26, 2014
Award Number: 1408874
Award Instrument: Standard Grant
Program Manager: Shannon Beck
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2014
End Date: September 30, 2019 (Estimated)
Total Intended Award Amount: $360,000.00
Total Awarded Amount to Date: $360,000.00
Funds Obligated to Date: FY 2014 = $360,000.00
History of Investigator:
  • XiaoFeng Wang (Principal Investigator)
    xw7@indiana.edu
  • Haixu Tang (Co-Principal Investigator)
Recipient Sponsored Research Office: Indiana University
107 S INDIANA AVE
BLOOMINGTON
IN  US  47405-7000
(317)278-3473
Sponsor Congressional District: 09
Primary Place of Performance: Indiana University
919 E. 10th Street
BLOOMINGTON
IN  US  47408-3912
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): YH86RTW2YVJ4
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7434, 7924
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Support for research on distributed data sets is challenged by stakeholder requirements limiting sharing. Researchers need early stage access to determine whether data sets are likely to contain the data they need. The Broker Leads project is developing privacy-enhancing technologies adapted to this discovery phase of data-driven research. Its approach is inspired by health information exchanges that are based on a broker system where data are held by healthcare providers and collected in distributed queries managed by the broker. Such systems have potential to support public health and biomedical research. The project targets "similar patient queries" where the query is a patient medical record and the response is information about similar patients. Such queries have value for many applications, including developing cohorts for finding institutions for further discussions about joint research.

Broker Leads uses the concept of a "lead" in which data holders provide representative collections of non-identifiable real or synthetic data meeting strong privacy guarantees, e.g., differential privacy. Even though such data may be unsuitable for clinical decision making and scientific discovery due to the transformations done for privacy protection, they guide a user of a broker lead system to the data sets very likely to be useful to addressing a given similar patient query. These data sets can then be used with other privacy-protecting strategies, such as secure multiparty computation or restrictive data use agreements ensuring adequate data protection. In addition to providing practical and well-analyzed strategies for early stages of research on healthcare data, this project will provide new insights into practical issues with privacy technology in end-to-end applications.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bu, D., Wang, X. and Tang, H. "Real-time Protection of Genomic Data Sharing in Beacon Services." AMIA Summits on Translational Science Proceedings , 2018 , p.45
G. Chen, W. Wang, T. Chen, S. Chen, Y. Zhang, X. Wang, T. Lai, D. Lin "Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races" the 39th IEEE Symposium on Security and Privacy (IEEE S&P) , 2018
H. Tang, X. Jiang, X. Wang, S. Wang, H. Sofia, D. Fox, K. Lauter, B. A. Malin, A. Telenti, L. Xiong and L. Ohno-Machado "Protecting Genomic Data Analytics in the Cloud: State of the Art and Opportunities" BMC Medical Genomics , 2016 , p.63
I. hagestedt, Y. Zhang,M. Humbert, P. Berrang, H. Tang, X. Wang and M. Backes "MBeacon: Privacy-Preserving Beacons for DNA Methylation Data" Proceedings of the Network and Distributed System Security Symposium (NDSS) , 2019
J. L. Raisaro, F. Tramer, Z. Ji, D. Bu, Y. Zhao, K. Carey, D. Lloyd, H. Sofia, D. Baker, P. Flicek, S. Shringarpure, C. Bustamante, S. Wang, X. Jiang, L. Ohno-Machado, H. Tang, X. Wang, J. Hubaux "Addressing beacon Re-Identification Attacks: Quantification and Mitigation of Privacy Risks" Journal of the American Medical Informatics Association (JAMIA) , 2017
S. Li, N. Bandeira, X. Wang, and H. Tang "On the privacy risks of sharing clinical proteomics data" the 2016 Joint Summit of American Medical Informatics Association , 2016
Wang, S., Jiang, X., Tang, H., Wang, X., Bu, D., Carey, K., Dyke, S.O., Fox, D., Jiang, C., Lauter, K. and Malin, B., Sofia, H., Telenti, A., Wang, L., Wang W. & Ohno-Machado, L. "A community effort to protect genomic data sharing, collaboration and outsourcing." NPJ genomic medicine , v.2 , 2017 , p.22
W. Wang, G. Chen, X. Pan, Y. Zhang, X. Wang, V. Bindschaedler, H. Tang, C.A. Gunter "Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX" the 24th ACM Conference on Computer and Communications Security (CCS'17) , 2017
X. Wang, Y. Huang, Y. Zhao, H. Tang, X. Wang and D. Bu "Efficient Genome-Wide, Privacy-Preserving Similar Patient Query based on Private Edit Distance" 22nd ACM Conference on Computer and Communications Security , 2015
Yongan ZhaoXiaoFeng WangXiaoqian JiangLucila Ohno-MachadoHaixu Tang "Choosing Blindly but Wisely: Differentially Private Solicitation of DNA Datasets for Disease Marker Discovery" Journal of the American Medical Informatics Association (JAMIA) , v.22 , 2015 , p.100 10.1136/amiajnl-2014-003043

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project contributed models and techniques to protect the privacy of shared data. Many of the case studies were based on leads generated by brokers from biomedical data, but the results are applicable to all types of data and covered a wide range of techniques.


A key area of investigation was the ability to support early stages of research, which are often characterized by a need for exploration in which researchers do not know the details of the hypotheses they will find most interesting. The project developed techniques for measuring the privacy protections of synthetic data that can be studied in a flexible manner while still being mathematically assured of protecting the privacy of the parties on which the synthetic data was based. One new technique developed in the project used “seedbased” synthetic generation that creates synthetic data based on a mixture of traits of subjects. Another new technique concerned how to measure membership privacy based on established privacy models and machine learning testing strategies.


Machine learning is a fundamental component of modern data analytics on biomedical data.  The project carried out the first investigation of distributed, collaborative learning from the data privacy perspective.  This multi-year study showed that (a) modern machine learning models may reveal the sensitive data used to train them, and (b) this leakage is exacerbated in collaborative learning scenarios.  The project also demonstrated that these potential privacy violations are rooted in how today’s machine learning frameworks and pipelines operate on data, and proposed new methods for mitigating threats to individual privacy.  These results open the road to secure, privacy-preserving, distributed machine learning.


The project also studied various techniques to support broker lead based privacy-preserving data sharing and applied such protection to various biomedical data. More specifically, the project developed the techniques for secure similar patient query, using approximation to simplify complicated protecting tasks. The project demonstrated the weaknesses in the beacon-based sharing and built up more effective protection from leads by adding noise to achieve differential privacy in response to the queries from the data user. This more effective protection is shown to work on different kinds of biomedical data, not only human genomes but also DNA methylation data. It was also shown that the side effects introduced by the noise could be addressed using trusted execution environments, which offer an efficient and secure channel for evaluating the utility of data before sharing.  The project demonstrated the great potential for lead-based secure data sharing by leveraging different confidential computing technologies. These results will continue to foster the community of biomedical data protection through the high-impact iDASH genome privacy competition. 


 


Last Modified: 12/30/2019
Modified by: Xiaofeng Wang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page