
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | May 6, 2010 |
Latest Amendment Date: | April 29, 2011 |
Award Number: | 0964094 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 1, 2010 |
End Date: | April 30, 2014 (Estimated) |
Total Intended Award Amount: | $873,125.00 |
Total Awarded Amount to Date: | $889,125.00 |
Funds Obligated to Date: |
FY 2011 = $16,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
101 COMMONWEALTH AVE AMHERST MA US 01003-9252 (413)545-0698 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
101 COMMONWEALTH AVE AMHERST MA US 01003-9252 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, TRUSTWORTHY COMPUTING |
Primary Program Source: |
01001112DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The goal of this research project is to enable statistical analysis and knowledge discovery on networks without violating the privacy of participating entities. Network data sets record the structure of computer, communication, social, or organizational networks, but they often contain highly sensitive information about individuals. The availability of network data is crucial for analyzing, modeling, and predicting the behavior of networks.
The team's approach is based on model-based generation of synthetic data, in which a model of the network is released under strong privacy conditions and samples from that model are studied directly by analysts. Output perturbation techniques are used to privately compute the parameters of popular network models. The resulting "noisy" model parameters are released, satisfying a strong, quantifiable privacy guarantee, but still preserving key properties of the networks. Analysts can use the released models to sample individual networks or to reason about properties of the implied ensemble of networks.
By synthesizing versions of networks that would otherwise remain hidden, this research can advance the study of topics such as disease transmission, network resiliency, and fraud detection. The project will result in publicly available privacy tools, a repository for derived models and sample networks, and contributions to workforce development in the field of information assurance. The experimental research is linked to educational efforts including undergraduate involvement in research through a Research Experience for Undergraduates site, as well as interdisciplinary seminars.
For further information see the project web site at the URL:
http://dbgroup.cs.umass.edu/private-network-data
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has developed a set of algorithms for protecting personal privacy while supporting the release of networked data sets. Data privacy research has most commonly been focused on tabular data, in which an individual is described by a set of attributes contained in a single record. Networked data poses a special challenge because it describes a graph in which an edge relation represents connections, interactions, or communication between named nodes. Protecting privacy is more complicated for this type of data: revealing the properties of connected individuals may constitute dangerous disclosures and revealing information about one individual is more likely to lead to inferences about other connected individuals.
This project has developed conceptual and technological advancements for modeling networked data sets under the rigorous model of differential privacy. Our basic approach is based on the model-based generation of synthetic data in which a model of the networked data set is released under strong privacy conditions and samples from that model are studied directly by analysts. The data received by analysts must be perturbed or distorted to preserve privacy, however analysts receive measures of estimated error along with synthesized data. The main contributions include the following:
- We developed algorithms for privately estimating a number of key statistics used with a popular model of network formation (the exponential random graph model). For these statistics, our method allows an analyst to fit this model to the data with improved accuracy.
- We developed a method for constructing synthetic multi-relational data sets (which generalize networked data beyond a single relationship) also with a rigorous privacy guarantee and improved accuracy.
- We investigated foundational issues in the statistical modeling of networked data, developing new modeling approaches that increase correctness and descriptive power.
The project enhanced cyber-security curricula at the undergraduate and graduate level, added to the cyber-security workforce, and our results were disseminated both nationally and internationally.
Last Modified: 07/29/2014
Modified by: Gerome Miklau
Please report errors in award information by writing to: awardsearch@nsf.gov.