
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 29, 2013 |
Latest Amendment Date: | July 18, 2014 |
Award Number: | 1320542 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2013 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
FY 2014 = $157,219.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1 SILBER WAY BOSTON MA US 02215-1703 (617)353-4365 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
111 Cummington Mall Boston MA US 02215-2411 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Network and graph data appear in many diverse applications such as social networks, biological networks, and mobile ad-hoc networks. In many cases, there is an inherent uncertainty in the available graph data either due to the data collection process or the preprocessing of the data. Furthermore, uncertainty or imprecise information becomes a critical impediment to understanding and effectively utilizing the information contained in such graphs. In this project, we address the problem of managing and mining such uncertain graphs. To do that, we adopt the possible-world semantics to probabilistic and uncertain graphs and within this framework we study algorithms for well-established data-mining problems. Motivated by real-life applications, we focus on specific data-analysis tasks. However, we make our framework general enough so that it can be used for a wide set of tasks and applications. In particular, we develop scalable and efficient algorithms and approaches to address a number of important tasks including nearest neighbor retrieval, clustering and partitioning, finding important nodes and edges, and summarizing large uncertain graphs.
We expect the results of this project to have an impact on several application domains. For example, they can help internet-based and social-media related companies to analyze their data and improve their targeting advertisement policies and practices. This will create opportunities for making these companies viable and help the economy of internet-based business. In medicine, biology and biochemistry, networks play a very important role and many of these networks can be better modeled as uncertain networks. Our project can help analyze these networks and lead to new biological insights. The results of this project are disseminated as follows: (1) we develop publicly available prototypes; (2) we include the results of our work in our classes/lectures; (3) we communicate our results to scientists of computer science and other fields and our industry collaborators through publications and demos. We also actively try to engage in our research graduate and undergraduate students, including women and minorities.
For further information see the web site at: http://www.cs.bu.edu/~gkollios/ugraphs/
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In this project, we investigated the relationship between uncertainty and graph databases. Graph databases appear in many applications. For example, in biology and medicine, graphs are used to represent interactions between different biological objects such as genes, proteins, and microbes. Furthermore, these relationships have some uncertainty due to the nature of interactions (sometimes a protein interacts with another one and sometimes not) and the way we collect the data. Another application comes from data management applications, like data cleaning, where we use AI and Machine Learning models to learn the relationship between objects. These models are not always accurate and have some uncertainty (or confidence) associated with them. Therefore, if you represent each object as a vertex in a graph and the relationship as an edge, the resulting graph is an uncertain (or probabilistic graph).
During this project, we have developed a number of important techniques that can help better analyze and query uncertain graphs. Here are the some of our main results: 1) we have developed approaches that can reduce the uncertainty of these uncertain graphs using active learning (choosing which relationship to investigate further in order to reduce the uncertainty), 2) we have developed an approach to compare two vertices from two different graphs and find if they are similar or not, and have developed algorithms that can compute te similarity efficiently, 3) we have developed techniques that can query and estimate distances between vertices in these graphs using encryption and privacy preserving tools so that we can outsource the data in a third party cloud and still be able to do some computation without the third party learning the actual data, and 4) we proposed a technique to identify communities with low tension, where each individual is represented as a vertex and the relationship between two individual is modeled using the opinions expressed by the individual both internally and externally.
Another important outcome of this project was training for graduate and post-graduate students. We had four graduate students supported by this project and one post-doc. Two of these students received their PhDs already and are working in the industry, one more student received his Masters, and another PhD student is graduating soon.
Finally, we collaborated with researchers and practitioners from the medical and public health domain and we used our methods to analyze uncertain graphs in order to improve their methods and approaches. In particular, we helped to analyze and understand better public health datasets and therefore to improve our defense on viral and infectious diseases. Also, we used our techniques and expertise that we developed in his project in order to help researchers to identify families of microbes in the oral cavity, and therefore help identify better methods and medicine to treat oral diseases and improve oral hygiene.
Overall, this project created new techniques in the area of uncertain and probabilistic graphs, promoted the use of uncertain graph analysis for many different applications, and have successfully exhibited how these techniques can be utilized to improve many other areas (e.g., medical and health research and professions) that have the potential to greatly affect and improve our lives.
Last Modified: 01/25/2018
Modified by: George Kollios
Please report errors in award information by writing to: awardsearch@nsf.gov.