NSF Award Search: Award # 1320542

Award Abstract # 1320542

III: Small: Managing and Mining Uncertain Graphs

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	TRUSTEES OF BOSTON UNIVERSITY
Initial Amendment Date:	August 29, 2013
Latest Amendment Date:	July 18, 2014
Award Number:	1320542
Award Instrument:	Continuing Grant
Program Manager:	Sylvia Spengler sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2013
End Date:	August 31, 2017 (Estimated)
Total Intended Award Amount:	$500,000.00
Total Awarded Amount to Date:	$500,000.00
Funds Obligated to Date:	FY 2013 = $342,781.00 FY 2014 = $157,219.00
History of Investigator:	George Kollios (Principal Investigator) gkollios@bu.edu Evimaria Terzi (Co-Principal Investigator)
Recipient Sponsored Research Office:	Trustees of Boston University 1 SILBER WAY BOSTON MA US 02215-1703 (617)353-4365
Sponsor Congressional District:	07
Primary Place of Performance:	Boston Univeristy 111 Cummington Mall Boston MA US 02215-2411
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	THL6A6JLE1S7
Parent UEI:
NSF Program(s):	Info Integration & Informatics
Primary Program Source:	01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7364, 7923
Program Element Code(s):	736400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Network and graph data appear in many diverse applications such as social networks, biological networks, and mobile ad-hoc networks. In many cases, there is an inherent uncertainty in the available graph data either due to the data collection process or the preprocessing of the data. Furthermore, uncertainty or imprecise information becomes a critical impediment to understanding and effectively utilizing the information contained in such graphs. In this project, we address the problem of managing and mining such uncertain graphs. To do that, we adopt the possible-world semantics to probabilistic and uncertain graphs and within this framework we study algorithms for well-established data-mining problems. Motivated by real-life applications, we focus on specific data-analysis tasks. However, we make our framework general enough so that it can be used for a wide set of tasks and applications. In particular, we develop scalable and efficient algorithms and approaches to address a number of important tasks including nearest neighbor retrieval, clustering and partitioning, finding important nodes and edges, and summarizing large uncertain graphs.

We expect the results of this project to have an impact on several application domains. For example, they can help internet-based and social-media related companies to analyze their data and improve their targeting advertisement policies and practices. This will create opportunities for making these companies viable and help the economy of internet-based business. In medicine, biology and biochemistry, networks play a very important role and many of these networks can be better modeled as uncertain networks. Our project can help analyze these networks and lead to new biological insights. The results of this project are disseminated as follows: (1) we develop publicly available prototypes; (2) we include the results of our work in our classes/lectures; (3) we communicate our results to scientists of computer science and other fields and our industry collaborators through publications and demos. We also actively try to engage in our research graduate and undergraduate students, including women and minorities.

For further information see the web site at: http://www.cs.bu.edu/~gkollios/ugraphs/

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hong BY, Furtado Araujo MV, Strausbaugh LD, Terzi E, Ioannidou E, Diaz PI "Microbiome profiles in periodontitis in relation to host and disease characteristics" PLoS One , v.10 , 2015 , p.e0127077 10.1371/journal.pone.0127077.

Loreto Abusleme, Amanda Dupuy, Nicola Dutzan, Nora Silva, Joseph Burleson, Linda Strausbaugh, Jorge Gamonal, Evimaria Terzi, and Patricia Diaz "The subgingival microbiome in health and periodontitis" ISME , 2014

Xianrui Meng, Seny Kamara, Kobbi Nissim, George Kollios "GRECS: Graph Encryption for Approximate Shortest Distance Queries" Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security , 2015 10.1145/2810103.2813672

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this project, we investigated the relationship between uncertainty and graph databases. Graph databases appear in many applications. For example, in biology and medicine, graphs are used to represent interactions between different biological objects such as genes, proteins, and microbes. Furthermore, these relationships have some uncertainty due to the nature of interactions (sometimes a protein interacts with another one and sometimes not) and the way we collect the data. Another application comes from data management applications, like data cleaning, where we use AI and Machine Learning models to learn the relationship between objects. These models are not always accurate and have some uncertainty (or confidence) associated with them. Therefore, if you represent each object as a vertex in a graph and the relationship as an edge, the resulting graph is an uncertain (or probabilistic graph).

During this project, we have developed a number of important techniques that can help better analyze and query uncertain graphs. Here are the some of our main results: 1) we have developed approaches that can reduce the uncertainty of these uncertain graphs using active learning (choosing which relationship to investigate further in order to reduce the uncertainty), 2) we have developed an approach to compare two vertices from two different graphs and find if they are similar or not, and have developed algorithms that can compute te similarity efficiently, 3) we have developed techniques that can query and estimate distances between vertices in these graphs using encryption and privacy preserving tools so that we can outsource the data in a third party cloud and still be able to do some computation without the third party learning the actual data, and 4) we proposed a technique to identify communities with low tension, where each individual is represented as a vertex and the relationship between two individual is modeled using the opinions expressed by the individual both internally and externally.

Another important outcome of this project was training for graduate and post-graduate students. We had four graduate students supported by this project and one post-doc. Two of these students received their PhDs already and are working in the industry, one more student received his Masters, and another PhD student is graduating soon.

Finally, we collaborated with researchers and practitioners from the medical and public health domain and we used our methods to analyze uncertain graphs in order to improve their methods and approaches. In particular, we helped to analyze and understand better public health datasets and therefore to improve our defense on viral and infectious diseases. Also, we used our techniques and expertise that we developed in his project in order to help researchers to identify families of microbes in the oral cavity, and therefore help identify better methods and medicine to treat oral diseases and improve oral hygiene.

Overall, this project created new techniques in the area of uncertain and probabilistic graphs, promoted the use of uncertain graph analysis for many different applications, and have successfully exhibited how these techniques can be utilized to improve many other areas (e.g., medical and health research and professions) that have the potential to greatly affect and improve our lives.

Last Modified: 01/25/2018
Modified by: George Kollios

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error