Award Abstract # 1218488
III: Small: A Theoretical Framework for Practical Entity Resolution in Network Data

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MARYLAND, COLLEGE PARK
Initial Amendment Date: August 28, 2012
Latest Amendment Date: August 28, 2012
Award Number: 1218488
Award Instrument: Standard Grant
Program Manager: Maria Zemankova
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2012
End Date: August 31, 2016 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2012 = $500,000.00
History of Investigator:
  • Lise Getoor (Principal Investigator)
    getoor@soe.ucsc.edu
Recipient Sponsored Research Office: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
(301)405-6269
Sponsor Congressional District: 04
Primary Place of Performance: University of Maryland College Park
MD  US  20742-5141
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NPU8ULVAAS23
Parent UEI: NPU8ULVAAS23
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In an era of information overload and big data, there is a pressing need to analyze, protect, prioritize and utilize data. Much of this data is inherently relational; thus, it is crucial to understand the benefits, challenges and potential hazards of exploiting the relational properties. Integrating, cleaning, and linking relational data requires matching and resolving references in the data. At the same time, matching and linking pose significant privacy risks. The proposed work develops a theoretical understanding of entity resolution in network data with the goal of developing tools and methods which can tell us how easy or difficult it will be to resolve data in different settings. Making use of the theory, new entity resolution algorithms will be developed with accuracy guarantees and for scaling entity resolution to large-scale data sources. These research results will enable more informed data sharing and usage decisions by individuals, industry, and government. Accurate analysis of network data is of utmost importance to science, medicine and national security. Whether studying socioeconomic trends, integrating data from large microarrays, analyzing organized crime or terrorist networks, or mining financial data for corporate misconduct, accurate network data, and its associated statistics, are crucial. At the same time, understanding how entity resolution effects privacy guarantees, and educating the public about the impact of releasing identifying information, is equally important.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bach, Stephen H. and Broecheler, Matthias and Huang, Bert and Getoor, Lise "Hinge-Loss Markov Random Fields and Probabilistic Soft Logic" Journal of Machine Learning Research , 2017
Fakhraei, Shobeir and Huang, Bert and Raschid, Louiqa and Getoor, Lise "Network-Based Drug-Target Interaction Prediction with Probabilistic Soft Logic" IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2014
London, Ben and Huang, Bert and Getoor, Lise "Stability and Generalization in Structured Prediction" Journal of Machine Learning Research , v.17 , 2016
Pujara, Jay and Miao, Hui and Getoor, Lise and Cohen, William "Using Semantics \& Statistics to Turn Data into Knowledge" AI Magazine , v.36 , 2015 , p.65--74
Sridhar, Dhanya and Fakhraei, Shobeir and Getoor, Lise "A Probabilistic Approach for Collective Similarity-based Drug-Drug Interaction Prediction" Bioinformatics , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Entity resolution, the problem of determining when two references refer to the same underlying entity, is an important and challenging problem.   The problem occurs in science (are these two genes the same? are these the same chemical? the same drug?), business domains (figuring out whether two custormers are the same, deterimining when two products are the same), and security and law-enforcement domains (are these two people the same?  are these organizations the same?).   Accurate entity resolution is an important first step before any further analysis is done -- performing predictive analytics on badly resolved data can (and has) lead to incorrect results which can have huge negative impact.   In this work, we looked specifically at the challenge of entity resolution in relational data or network data.   Because so many realworld entity resolution problems don't occur in isolation, but co-occur in interesting and complex ways, this is an important, but ill-understood setting.   Relational entity resolution provides the ability to improve the quality of resolution by making use of structure.   However, the theoretical benefits of this setting are poorly understood.   In this work, we were able to show theoretical results explaining why relational entity resolution is beneficial, and how the generalization performance of collective classification algorithms in network settings benefits from the relational information.   In addition, we studied the application of entity resolution in a variety of practical problems including entity resolutiion in knowledge graphs, entity resolution for familal networks and entity resolution to support visitor stitching (matching users across devices).


Last Modified: 11/03/2016
Modified by: Lise C Getoor

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page