
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 28, 2012 |
Latest Amendment Date: | August 28, 2012 |
Award Number: | 1218488 |
Award Instrument: | Standard Grant |
Program Manager: |
Maria Zemankova
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2012 |
End Date: | August 31, 2016 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3112 LEE BUILDING COLLEGE PARK MD US 20742-5100 (301)405-6269 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
MD US 20742-5141 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
In an era of information overload and big data, there is a pressing need to analyze, protect, prioritize and utilize data. Much of this data is inherently relational; thus, it is crucial to understand the benefits, challenges and potential hazards of exploiting the relational properties. Integrating, cleaning, and linking relational data requires matching and resolving references in the data. At the same time, matching and linking pose significant privacy risks. The proposed work develops a theoretical understanding of entity resolution in network data with the goal of developing tools and methods which can tell us how easy or difficult it will be to resolve data in different settings. Making use of the theory, new entity resolution algorithms will be developed with accuracy guarantees and for scaling entity resolution to large-scale data sources. These research results will enable more informed data sharing and usage decisions by individuals, industry, and government. Accurate analysis of network data is of utmost importance to science, medicine and national security. Whether studying socioeconomic trends, integrating data from large microarrays, analyzing organized crime or terrorist networks, or mining financial data for corporate misconduct, accurate network data, and its associated statistics, are crucial. At the same time, understanding how entity resolution effects privacy guarantees, and educating the public about the impact of releasing identifying information, is equally important.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Entity resolution, the problem of determining when two references refer to the same underlying entity, is an important and challenging problem. The problem occurs in science (are these two genes the same? are these the same chemical? the same drug?), business domains (figuring out whether two custormers are the same, deterimining when two products are the same), and security and law-enforcement domains (are these two people the same? are these organizations the same?). Accurate entity resolution is an important first step before any further analysis is done -- performing predictive analytics on badly resolved data can (and has) lead to incorrect results which can have huge negative impact. In this work, we looked specifically at the challenge of entity resolution in relational data or network data. Because so many realworld entity resolution problems don't occur in isolation, but co-occur in interesting and complex ways, this is an important, but ill-understood setting. Relational entity resolution provides the ability to improve the quality of resolution by making use of structure. However, the theoretical benefits of this setting are poorly understood. In this work, we were able to show theoretical results explaining why relational entity resolution is beneficial, and how the generalization performance of collective classification algorithms in network settings benefits from the relational information. In addition, we studied the application of entity resolution in a variety of practical problems including entity resolutiion in knowledge graphs, entity resolution for familal networks and entity resolution to support visitor stitching (matching users across devices).
Last Modified: 11/03/2016
Modified by: Lise C Getoor
Please report errors in award information by writing to: awardsearch@nsf.gov.