Award Abstract # 1343976
SCH: EXP: Collaborative Research: Privacy-Preserving Framework for Publishing Electronic Healthcare Records

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: GEORGE WASHINGTON UNIVERSITY (THE)
Initial Amendment Date: September 13, 2013
Latest Amendment Date: September 8, 2017
Award Number: 1343976
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2014
End Date: December 31, 2018 (Estimated)
Total Intended Award Amount: $269,505.00
Total Awarded Amount to Date: $269,505.00
Funds Obligated to Date: FY 2013 = $269,505.00
History of Investigator:
  • Nan Zhang (Principal Investigator)
    zhang.nan@ufl.edu
  • Xiuzhen Cheng (Co-Principal Investigator)
  • Xiuzhen Cheng (Former Principal Investigator)
  • Xiuzhen Cheng (Former Co-Principal Investigator)
Recipient Sponsored Research Office: George Washington University
1918 F ST NW
WASHINGTON
DC  US  20052-0042
(202)994-0728
Sponsor Congressional District: 00
Primary Place of Performance: George Washington University
801 22nd St NW
Washington
DC  US  20052-0001
Primary Place of Performance
Congressional District:
00
Unique Entity Identifier (UEI): ECR5E2LU5BL6
Parent UEI:
NSF Program(s): Smart and Connected Health
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 8018, 8061
Program Element Code(s): 801800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project builds a novel privacy-preserving framework with both new algorithms and software tools to: 1) evaluate the effectiveness of current identifier-suppression techniques for Electronic Healthcare Record (EHR) data; 2) de-identify and anonymize EHR data to protect personal information without significantly reducing the utility of data for secondary data analysis. The proposed techniques eliminate the violation of privacy through re-identification, and facilitate the secondary usage, sharing, publishing and exchange of healthcare data without the risk of breaching protected health information (PHI). This new privacy-preserving framework injects the ICD-9-CM-aware constraint-based privacy-preserving techniques into EHRs to eliminate the threat of identifying an individual in the secondary use of research data. The proposed technique and development can be readily adapted to other types of healthcare databases in order to ensure privacy and prevent re-identification of published data. The project produces groundbreaking algorithms and tools for identifying privacy leakages and protecting personal privacy information in EHRs to improve healthcare data publishing. New privacy-preserving techniques developed in this project lead towards a new type of healthcare science for EHRs. The project also delivers fundamental advancements to engineering by showing how to integrate biomedical domain knowledge with a computationally advanced quantitative framework for preserving the privacy of published EHRs. HIPAA has established protocols and industry standards to protect the confidentiality of PHI. However, our results demonstrate that, even with regard to health data that meets HIPAA requirements, the risk of re-identification is not completely eliminated. By identifying the security vulnerabilities inherent in the HIPAA standards, our research develops a more rigorous security standard that greatly improves privacy protections by applying state-of-the-art algorithms.

The developed data privacy-preserving framework has significant implications for the future of US healthcare data publishing and related applications. Specifically, the transition from paper records to EHRs has accelerated significantly since the passage of the HITECH Act of 2009. The Act provides monetary incentives for the "meaningful use" of EHRs. As a result, the quality and quantity of healthcare databases has risen sharply, which has renewed the public's fear of a breach of privacy of their medical information. This research work is innovative and crucial not only for facilitating EHR data publishing, but also for enhancing the development and promotion of EHRs. At the educational front, this project facilitates the development of novel educational tools to construct entirely new courses and laboratory classes for healthcare, data privacy, data mining, and a wide range of applications. As a result, it enhances the current instructional methods for teaching data privacy and data mining, and has compelling biomedical and healthcare applications that can facilitate learning of computational algorithms. This project involves both undergraduate and graduate students in the three participating institutions. The PIs make a strong effort to engage minority graduate and undergraduate students in research activities in order to increase their exposure to cutting-edge research.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 16)
A. Asudeh, A. Nazi, N. Zhang, G. Das "Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative" Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) , 2017
A. Nazi, S. Thirumuruganathan, V. Hristidis, N. Zhang, G. Das "Answering Complex Queries in an Online Community Network" Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) , 2015 , p.662
De Wang, Feiping Nie, Heng Huang "Fast Robust Non-negative Matrix Factorization for Large-Scale Data Clustering" 25th International Joint Conference on Artificial Intelligence (IJCAI 2016) , 2016 , p.2104
Feiping Nie, Heng Huang "Subspace Clustering via New Discrete Group Structure Constrained Low-Rank Model" 25th International Joint Conference on Artificial Intelligence (IJCAI 2016) , 2016 , p.1874
H. Yan, Z. Gong, N. Zhang, T. Huang, H. Zhong, J. Wei "Crawling Hidden Objects with kNN Queries" IEEE Transactions on Knowledge and Data Engineering (TKDE) , v.28 , 2016 , p.912
J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, W. Zhao "A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications" IEEE Internet of Things Journal , 2017
L. O?Neill, F. Dexter, N. Zhang "The Risks to Patient Privacy from Publishing Data from Clinical Anesthesia Studies" Anesthesia & Analgesia , v.122 , 2016 , p.2017
Peng Li, Heng Huang "Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports" The 10th International Workshop on Semantic Evaluation Joint with NAACL2016 , 2016
S. Wang, R. Bie, F. Zhao, N. Zhang, and X. Cheng "Security in Wearable Communications" IEEE Network , v.30 , 2016 , p.61
W. Liu, M. F. Rahman, S. Thirumuruganathan, N. Zhang, G. Das "Aggregate Estimations over Location Based Services" Proceedings of the VLDB Endowment (PVLDB) , v.8 , 2015 , p.1334
Xiaoqian Wang, Feiping Nie, Heng Huang "Structured Doubly Stochastic Matrix for Graph Based Clustering" 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016) , 2016 , p.1245
(Showing: 1 - 10 of 16)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Our research findings are highly relevant to the current debate over health information privacy. Our research calls into question the efficacy of the state-of-the-practice anonymization framework.  For example, we demonstrated how the publication of public health data, even after applying state-of-the-practice data anonymization techniques, could lead to breaches of patient privacy. Based on the findings, in an article published at the Anesthesia and Analgesia journal, we proposed an editorial policy for anesthesia journals to significantly reduce the likelihood of a privacy breach while supporting the goal of transparency of the research process.

Another interesting finding from our research is related to the linking of healthcare data with publicly available information, such as data from social media.  Contrary to the conventional wisdom that privacy leakage in this case is mostly caused by a user's own posts, we found that a significant amount of privacy disclosure on social media, especially on attributes that are often linkable to healthcare data, is caused by activities of a user's social ties. Our research not only demonstrated the existence of such "peer-disclosure", but also identified the intriguing differences between the identity elements revealed through self- and peer-disclosure.

Our research also examines the inherent tradeoff between privacy protection and data utility.  For example, we studied how the application of a state-of-the-practice data anonymization technique and a state-of-the-art differential privacy technique affects the ability for researchers to detect evidence of health disparity from the privacy-preserved data.  Our results demonstrated the complex challenges facing privacy protection in health disparity research, as the essential data elements for enabling health disparity research might lead to privacy disparity and cause further harm to underserved populations if not carefully treated.

 


Last Modified: 03/22/2019
Modified by: Nan Zhang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page