Award Abstract # 1741129
BIGDATA: F: Collaborative Research: Design and Computation of Scalable Graph Distances in Metric Spaces: A Unified Multiscale Interpretable Perspective

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF BOSTON COLLEGE
Initial Amendment Date: September 5, 2017
Latest Amendment Date: September 5, 2017
Award Number: 1741129
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2017
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $599,249.00
Total Awarded Amount to Date: $599,249.00
Funds Obligated to Date: FY 2017 = $599,249.00
History of Investigator:
  • Jose Bento (Principal Investigator)
    bentoayr@bc.edu
Recipient Sponsored Research Office: Boston College
140 COMMONWEALTH AVE
CHESTNUT HILL
MA  US  02467-3800
(617)552-8000
Sponsor Congressional District: 04
Primary Place of Performance: Boston College
140 Commonwealth Avenue
Chestnut Hill
MA  US  02467-3800
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): MJ3JH8CRJBZ7
Parent UEI:
NSF Program(s): Big Data Science &Engineering
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8083
Program Element Code(s): 808300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Representations of real-world phenomena as graphs (a.k.a. networks) are ubiquitous, ranging from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks -- including clustering, anomaly detection, nearest neighbor, similarity search, pattern recognition, and transfer learning -- require a distance measure between graphs to be computed efficiently. The existing distance measures between graphs leave a lot to be desired. They are overwhelmingly based on heuristics.  Many do not scale to graphs with millions of nodes; others do not satisfy the metric properties of non-negativity, positive definiteness, symmetry, and triangle inequality. This project studies a formal mathematical foundation covering a family of graph distances that overcome these limitations, focusing on real-world applications in biology and social network analysis. It also provides a universal methodology for parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes, and scaling it over cloud computing resources.

This project studies, designs, and evaluates graph distances that satisfy the following six properties: (1) They are scalable -- i.e., they are strictly subquadratic in runtime and achieve a speedup when computed in parallel. (2) They are metrics -- i.e., they satisfy
non-negativity, positive definiteness, symmetry, and triangle inequality. (3) They are discriminative, as measured by comparisons to the "chemical distance", which finds the optimal mapping between two graphs that minimizes edge discrepancies. (4) They are statistically
robust -- i.e., they have confidence intervals. (5) They can incorporate auxiliary information available on nodes and links. (6) They are interpretable to subject matter experts. Rather than providing a single metric, this project explores a family of such graph distance metrics. It also provides a universal methodology, using the Alternating Directions Method of Multipliers (ADMM), to parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes. The proposed metrics are evaluated over massive real-world graphs using Apache Spark on a cloud computing infrastructure.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bento, José and Ioannidis, Stratis "A Family of Tractable Graph Distances" Proceedings of the 2018 SIAM International Conference on Data Mining , 2018 Citation Details
Jia, Bei and Ray, Surjyendu and Safavi, Sam and Bento, Jose "Efficient Projection onto the Perfect Phylogeny Model" Advances in Neural Information Processing Systems 31 (NIPS 2018) , 2018 Citation Details
Safavi, Sam and Joshi, Bikash and França, Guilherme and Bento, José "An Explicit Convergence Rate for Nesterov?s Method from SDP" 2016 IEEE International Symposium on Information Theory , 2018 Citation Details
Safavi, Sam and Joshi, Bikash and França, Guilherme and Bento, José "An Explicit Convergence Rate for Nesterov?s Method from SDP" 2018 IEEE International Symposium on In Information Theory , 2018 Citation Details
Yang, Laurence and Saunders, Michael A. and Lachance, Jean-Christophe and Palsson, Bernhard O. and Bento, José "Estimating Cellular Goals from High-Dimensional Biological Data" Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2019 10.1145/3292500.3330775 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

1. Intellectual Merit:

 

1.1. Tractable Graph Distances

We studied generalizations of the so-called chemical and Chartrand-Kubiki-Shultz (CKS) distances that (a) satisfy the metric property and (b) are tractable, in that they can becomputed either by solving a convex optimization problem, or by a polynomial time algorithm. Inparticular, we proposed relaxations over the Birkhoff polytope and the Stiefel manifold. We also soughtways of incorporating node features of in these metrics. We also proposed and implemented a massivelyparallel implementation framework for their computation via the alternating direction’s method ofmultipliers (ADMM) and implemented this over Apache Spark. 

1.2. Multi-distances

We studied generazations of graph distances to graph multi-distances. The first take as input two graphs and output a non-negative real number which measures the closeness of two graphs from each other, while the later take as input many graphs and output a non-negative real number. Our graph multi-distances satisfy a generalization of the properties of metrics to multiple elements, the most important one being a generalization of the triangle inequality. In addition to satisfying a generalized metric properties, these multi-distances provide a joint alignment between multiple graphs that satisfy alignment consistency. We show that our multi-distances can be relaxed to convex optimization problems, without losing the generalized metric property.

 

2. Broader Impact 

 

Two postdoctoral students and one undergraduate student were involved in theproject during its duration. Research findings were disseminated in 2 conference and 2 journal publications. Results were further disseminated through several tutorials organized by the PIs,as well as through invited seminars and keynotes at academic institutions and companies in the US as well as internationally. Our code was made publicly available and shared via github repositories.


Last Modified: 12/26/2023
Modified by: Jose Bento

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page