Award Abstract # 1116991
TC: Small: Provably Private Microdata Publishing

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: PURDUE UNIVERSITY
Initial Amendment Date: August 18, 2011
Latest Amendment Date: August 18, 2011
Award Number: 1116991
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2011
End Date: August 31, 2016 (Estimated)
Total Intended Award Amount: $439,227.00
Total Awarded Amount to Date: $439,227.00
Funds Obligated to Date: FY 2011 = $439,227.00
History of Investigator:
  • Ninghui Li (Principal Investigator)
Recipient Sponsored Research Office: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
(765)494-1055
Sponsor Congressional District: 04
Primary Place of Performance: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): YRXVL4JYCEF5
Parent UEI: YRXVL4JYCEF5
NSF Program(s): TRUSTWORTHY COMPUTING
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7795, 7923
Program Element Code(s): 779500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Data is a key resource in this information age. The availability of data, however, often causes privacy concerns. Many data sharing scenarios require data be anonymized for privacy protection. Most existing data anonymization techniques, however, satisfy only weak privacy notions that rely on particular assumptions about the adversaries, and provide inadequate protection. In recent years, the elegant notion of differential privacy has gradually been accepted as the privacy notion of choice for answering statistical queries. Most research on differential privacy, however, focuses on answering interactive queries, and there are several negative results on publishing microdata while satisfying differential privacy. Regardless, many data sharing scenarios require sharing of microdata, and research is needed to bridge this gap.

This project aims at bridging the gap between the elegant notion of differential privacy, and the practical difficulty of publishing microdata while preserving utility. Building on the preliminary results of the PI on using random sampling together with "safe" k-anonymization to satisfy differential privacy, this project aims at advancing the state of the art of both scientific understanding and specific techniques for privacy-preserving microdata publishing. Research activities include developing (1) Practical anonymization methods that can be proven to satisfy differential privacy, while capable of handling high-dimensional data; (2) Relaxations of differential privacy that are more suitable for microdata publishing; (3) Privacy theory and techniques that are easily applied to a family of data sanitization algorithms called localized algorithms, enabling the usage of input perturbation techniques for provably-private microdata publishing; (4) Privacy notions and techniques for publishing social network data and network trace data.

Advances in data anonymization techniques will benefit the society by providing a better balance between the need to release data to serve public interest and the need to protect individuals' privacy. This project also involves developing a graduate seminar course on data privacy, and supports two graduate students.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, Hongxia Jin "Differentially Private k-Means Clustering" ACM Conference on Data and Application Security and Privacy , 2016 , p.26 10.1145/2857705.2857708
Wahbeh H. Qardaji, Weining Yang, Ninghui Li "Understanding Hierarchical Methods for Differentially Private Histograms" PVLDB , v.6 , 2013 , p.1954
Wei-Yen Day, Ninghui Li, Min Lyu "Publishing Graph Degree Distribution with Node Differential Privacy" ACM SIGMOD , 2016 , p.123 10.1145/2882903.2926745

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has developed new data privacy notions as well as better techniques for privacy-preserving data publishing and analysis.  This project demonstrated the relationship between two major privacy notions: k-anonnymity and differential privacy.  This project also resulted in the development of the membership privacy framework, which defines privacy as preventing any adversary from learning that an individual's data is included in the input dataset.  The membership privacy framework generalizes differential privacy, makes the undelying assumptions of differential privacy explicit, and points out ways to relax the notion of differential privacy in a principled way so that more useful information can be learned from the data. 

This project also resulted in several state of the art techniques for performing data analysis tasks while satisfying differential privacy.  While providing the same level of privacy guarantees, some algorithms can perform analysis tasks more accurately than others.  This project has resulted in algorithms for publishing histograms of low-dimensional datasets, frequent itemset mining and publishing item counts for transactional datasets, k-means clustering, classification using decesion trees, logistic regression, and Support Vector Machines, answering marginal queries, and publishing graph node degree distribution.  Algorithms introduced in the beginning of the project (published in 2013) were still found to be the best-performing ones by a study in 2016 conducted by other researchers who compared a large number of competing algorithms, even though this area has been under intensive study in recent years.  

This also project also supported the study of two PhD students.  One just defended his PhD dissertation and joined IBM T.J.Watson Research Center.  Another plans to defend in December 2016, and will join Google Inc.


Last Modified: 10/21/2016
Modified by: Ninghui Li

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page