Award Abstract # 1934979
HDR TRIPODS: Institute for the Foundations of Graph and Deep Learning

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: THE JOHNS HOPKINS UNIVERSITY
Initial Amendment Date: September 13, 2019
Latest Amendment Date: January 24, 2023
Award Number: 1934979
Award Instrument: Continuing Grant
Program Manager: Christopher Stark
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2019
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $1,500,000.00
Total Awarded Amount to Date: $1,500,000.00
Funds Obligated to Date: FY 2019 = $1,000,000.00
FY 2021 = $500,000.00
History of Investigator:
  • Rene Vidal (Principal Investigator)
    vidalr@upenn.edu
  • Carey Priebe (Co-Principal Investigator)
  • Raman Arora (Co-Principal Investigator)
  • Enrique Mallada (Co-Principal Investigator)
  • Mauro Maggioni (Former Co-Principal Investigator)
Recipient Sponsored Research Office: Johns Hopkins University
3400 N CHARLES ST
BALTIMORE
MD  US  21218-2608
(443)997-1898
Sponsor Congressional District: 07
Primary Place of Performance: Johns Hopkins University
MD  US  21218-2686
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): FTMTDMBR29C7
Parent UEI: GS4PNKTRNKL3
NSF Program(s): TRIPODS Transdisciplinary Rese,
HDR-Harnessing the Data Revolu
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 047Z, 062Z
Program Element Code(s): 041Y00, 099Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Classical data-analysis methods were based on mathematical, physical or statistical models for the data-generation process, which were developed under the assumption that the data were relatively clean and collected for a specific task. Over the past few decades, advances in data acquisition have led to massive, noisy, high-dimensional datasets, which were not necessarily collected for a specific task. This has lead to the emergence of data-driven methods, such as deep learning, which use massive amounts of labeled data to learn 'black-box' models, which do not provide an explicit description of the process being modeled. Such data-driven methods have led to dramatic improvements in the performance of pattern-recognition systems for applications in computer vision and speech recognition for which massive amounts of labeled data can be generated. However, existing models are not very interpretable, and their predictions are not robust to adversarial perturbations. Moreover, there are many applications in science and engineering where data labeling is extremely costly, and the ability to interpret model predictions and produce estimates of uncertainty is essential. To address these challenges, a TRIPODS Institute on the Theoretical Foundations of Data Science will be created at Johns Hopkins University. The goals of the institute will be to (1) develop the foundations for the next generation of data analysis methods, which will integrate model-based and data-driven approaches, (2) foster interactions among data scientists through a monthly seminar series, semester-long research themes, an annual research symposium, and a summer research school and workshop on the foundations of data science, and (3) create new undergraduate and graduate curricula on the foundations of data science.

The institute brings together a multidisciplinary team of mathematicians, statisticians, theoretical computer scientists, and electrical engineers with expertise in the foundations of machine learning, deep learning, statistical learning and inference on graphs, optimization, approximation theory, signal processing, dynamical systems and controls, to develop the foundations for the next generation of data-analysis methods, which will integrate model-based and data-driven approaches. In particular, the institute will focus on studying the foundations of deep neural models (e.g., feedforward networks, recurrent networks, generative adversarial networks) and generative models of structured data (e.g., graphical models, random graphs, dynamical systems), with the ultimate goal of arriving at integrated models that are more interpretable, robust to perturbations, and learnable with minimal supervision. The goals of the Phase I Institute will be to (1) study generalization, optimization and approximation properties of feedforward networks, (2) develop the foundations of statistical inference and learning on and of graphs, and (3) study the integration of deep networks and graphs for learning maps between structured datasets.

This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Castellano, Agustin and Min, Hancheng and Bazerque, Juan A. and Mallada, Enrique "Reinforcement Learning with Almost Sure Constraints" Proceedings of The 4th Annual Learning for Dynamics and Control Conference, , v.168 , 2022 Citation Details
Guthrie, James and Kobilarov, Marin and Mallada, Enrique "Closed-Form Minkowski Sum Approximations for Efficient Optimization-Based Collision Avoidance" Proceedings of the American Control Conference , 2022 https://doi.org/10.23919/ACC53348.2022.9867524 Citation Details
Kaba, Mustafa Devrim and You, Chong and Robinson, Daniel R and Mallada, Enrique and Vidal, Rene "A Nullspace Property for Subspace-Preserving Recovery" Proceedings of Machine Learning Research , v.139 , 2021 Citation Details
Lawrence, Liam S. and Simpson-Porco, John W. and Mallada, Enrique "Linear-Convex Optimal Steady-State Control" IEEE Transactions on Automatic Control , v.66 , 2021 https://doi.org/10.1109/TAC.2020.3044275 Citation Details
Little, Anna and Maggioni, Mauro and Murphy, James M. "Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms" Journal of machine learning research , 2020 https://doi.org/ Citation Details
Min, Hancheng and Mallada, Enrique "Learning coherent clusters in weaklyconnected network systems" Proceedings of The 5th Annual Learning for Dynamics and Control Conference , v.PMLR , 2023 Citation Details
Min, Hancheng and Mallada, Enrique "Spectral clustering and model reduction for weakly-connected coherent network systems" American Control Conference , 2023 https://doi.org/10.23919/ACC55779.2023.10156212 Citation Details
Min, Hancheng and Paganini, Fernando and Mallada, Enrique "Accurate Reduced-Order Models for Heterogeneous Coherent Generators" American Control Conference , 2021 https://doi.org/10.23919/ACC50511.2021.9483031 Citation Details
Min, Hancheng and Paganini, Fernando and Mallada, Enrique "Accurate Reduced-Order Models for Heterogeneous Coherent Generators" IEEE Control Systems Letters , v.5 , 2021 https://doi.org/10.1109/LCSYS.2020.3043733 Citation Details
Min, Hancheng and Tarmoun, Salma and Vidal, Rene and Mallada, Enrique "On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks" Proceedings of Machine Learning Research , 2021 Citation Details
Pates, Richard and Ferragut, Andres and Pivo, Elijah and You, Pengcheng and Paganini, Fernando and Mallada, Enrique "Respect the Unstable: Delays and Saturation in Contact Tracing for Disease Control" SIAM Journal on Control and Optimization , v.60 , 2022 https://doi.org/10.1137/20M1377825 Citation Details
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Classical data analysis methods were based on mathematical, physical, or statistical models for the data-generation process, which were developed under the assumption that the data were relatively clean and collected for a specific task. Over the past few decades, advances in data acquisition have led to massive, noisy, high-dimensional datasets, which were not necessarily collected for a specific task. This has led to the emergence of data-driven methods, such as deep learning, which use massive amounts of labeled data to learn 'black-box' models that do not provide an explicit description of the process being modeled. Such data-driven methods have led to dramatic improvements in the performance of pattern recognition systems for applications in computer vision and speech recognition for which massive amounts of labeled data can be generated. However, existing models are not very interpretable, and their predictions are not robust to adversarial perturbations. Moreover, there are many applications in science and engineering where data labeling is extremely costly, and the ability to interpret model predictions and produce estimates of uncertainty is essential.

To address these challenges, this TRIPODS project supported the creation of the Mathematical Institute for Data Science (MINDS) at the Johns Hopkins University. The institute brings together a multidisciplinary team of mathematicians, statisticians, theoretical computer scientists, and electrical engineers with expertise in the foundations of machine learning, deep learning, statistical learning and inference on graphs, optimization, approximation theory, signal processing, dynamical systems and controls, to develop the foundations for the next generation of data-analysis methods, which will integrate model-based and data-driven approaches. In particular, the institute focuses on studying the foundations of deep neural models (e.g., feedforward networks, recurrent networks, generative adversarial networks) and generative models of structured data (e.g., graphical models, random graphs, dynamical systems), with the ultimate goal of arriving at integrated models that are more interpretable, robust to perturbations, and learnable with minimal supervision. In addition, the institute fosters interactions among data scientists through a monthly seminar series, annual research symposium, organization of conferences and workshops. The institute also helped create a new MSc in Data Science program at the Johns Hopkins University.

 


Last Modified: 04/13/2024
Modified by: Rene Vidal

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page