
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 5, 2017 |
Latest Amendment Date: | September 5, 2017 |
Award Number: | 1741129 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2017 |
End Date: | August 31, 2023 (Estimated) |
Total Intended Award Amount: | $599,249.00 |
Total Awarded Amount to Date: | $599,249.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
140 COMMONWEALTH AVE CHESTNUT HILL MA US 02467-3800 (617)552-8000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
140 Commonwealth Avenue Chestnut Hill MA US 02467-3800 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Big Data Science &Engineering |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Representations of real-world phenomena as graphs (a.k.a. networks) are ubiquitous, ranging from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks -- including clustering, anomaly detection, nearest neighbor, similarity search, pattern recognition, and transfer learning -- require a distance measure between graphs to be computed efficiently. The existing distance measures between graphs leave a lot to be desired. They are overwhelmingly based on heuristics. Many do not scale to graphs with millions of nodes; others do not satisfy the metric properties of non-negativity, positive definiteness, symmetry, and triangle inequality. This project studies a formal mathematical foundation covering a family of graph distances that overcome these limitations, focusing on real-world applications in biology and social network analysis. It also provides a universal methodology for parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes, and scaling it over cloud computing resources.
This project studies, designs, and evaluates graph distances that satisfy the following six properties: (1) They are scalable -- i.e., they are strictly subquadratic in runtime and achieve a speedup when computed in parallel. (2) They are metrics -- i.e., they satisfy
non-negativity, positive definiteness, symmetry, and triangle inequality. (3) They are discriminative, as measured by comparisons to the "chemical distance", which finds the optimal mapping between two graphs that minimizes edge discrepancies. (4) They are statistically
robust -- i.e., they have confidence intervals. (5) They can incorporate auxiliary information available on nodes and links. (6) They are interpretable to subject matter experts. Rather than providing a single metric, this project explores a family of such graph distance metrics. It also provides a universal methodology, using the Alternating Directions Method of Multipliers (ADMM), to parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes. The proposed metrics are evaluated over massive real-world graphs using Apache Spark on a cloud computing infrastructure.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
1. Intellectual Merit:
1.1. Tractable Graph Distances
We studied generalizations of the so-called chemical and Chartrand-Kubiki-Shultz (CKS) distances that (a) satisfy the metric property and (b) are tractable, in that they can becomputed either by solving a convex optimization problem, or by a polynomial time algorithm. Inparticular, we proposed relaxations over the Birkhoff polytope and the Stiefel manifold. We also soughtways of incorporating node features of in these metrics. We also proposed and implemented a massivelyparallel implementation framework for their computation via the alternating direction’s method ofmultipliers (ADMM) and implemented this over Apache Spark.
1.2. Multi-distances
We studied generazations of graph distances to graph multi-distances. The first take as input two graphs and output a non-negative real number which measures the closeness of two graphs from each other, while the later take as input many graphs and output a non-negative real number. Our graph multi-distances satisfy a generalization of the properties of metrics to multiple elements, the most important one being a generalization of the triangle inequality. In addition to satisfying a generalized metric properties, these multi-distances provide a joint alignment between multiple graphs that satisfy alignment consistency. We show that our multi-distances can be relaxed to convex optimization problems, without losing the generalized metric property.
2. Broader Impact
Two postdoctoral students and one undergraduate student were involved in theproject during its duration. Research findings were disseminated in 2 conference and 2 journal publications. Results were further disseminated through several tutorials organized by the PIs,as well as through invited seminars and keynotes at academic institutions and companies in the US as well as internationally. Our code was made publicly available and shared via github repositories.
Last Modified: 12/26/2023
Modified by: Jose Bento
Please report errors in award information by writing to: awardsearch@nsf.gov.