
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | September 13, 2019 |
Latest Amendment Date: | January 24, 2023 |
Award Number: | 1934979 |
Award Instrument: | Continuing Grant |
Program Manager: |
Christopher Stark
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2019 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $1,500,000.00 |
Total Awarded Amount to Date: | $1,500,000.00 |
Funds Obligated to Date: |
FY 2021 = $500,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3400 N CHARLES ST BALTIMORE MD US 21218-2608 (443)997-1898 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
MD US 21218-2686 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
TRIPODS Transdisciplinary Rese, HDR-Harnessing the Data Revolu |
Primary Program Source: |
01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Classical data-analysis methods were based on mathematical, physical or statistical models for the data-generation process, which were developed under the assumption that the data were relatively clean and collected for a specific task. Over the past few decades, advances in data acquisition have led to massive, noisy, high-dimensional datasets, which were not necessarily collected for a specific task. This has lead to the emergence of data-driven methods, such as deep learning, which use massive amounts of labeled data to learn 'black-box' models, which do not provide an explicit description of the process being modeled. Such data-driven methods have led to dramatic improvements in the performance of pattern-recognition systems for applications in computer vision and speech recognition for which massive amounts of labeled data can be generated. However, existing models are not very interpretable, and their predictions are not robust to adversarial perturbations. Moreover, there are many applications in science and engineering where data labeling is extremely costly, and the ability to interpret model predictions and produce estimates of uncertainty is essential. To address these challenges, a TRIPODS Institute on the Theoretical Foundations of Data Science will be created at Johns Hopkins University. The goals of the institute will be to (1) develop the foundations for the next generation of data analysis methods, which will integrate model-based and data-driven approaches, (2) foster interactions among data scientists through a monthly seminar series, semester-long research themes, an annual research symposium, and a summer research school and workshop on the foundations of data science, and (3) create new undergraduate and graduate curricula on the foundations of data science.
The institute brings together a multidisciplinary team of mathematicians, statisticians, theoretical computer scientists, and electrical engineers with expertise in the foundations of machine learning, deep learning, statistical learning and inference on graphs, optimization, approximation theory, signal processing, dynamical systems and controls, to develop the foundations for the next generation of data-analysis methods, which will integrate model-based and data-driven approaches. In particular, the institute will focus on studying the foundations of deep neural models (e.g., feedforward networks, recurrent networks, generative adversarial networks) and generative models of structured data (e.g., graphical models, random graphs, dynamical systems), with the ultimate goal of arriving at integrated models that are more interpretable, robust to perturbations, and learnable with minimal supervision. The goals of the Phase I Institute will be to (1) study generalization, optimization and approximation properties of feedforward networks, (2) develop the foundations of statistical inference and learning on and of graphs, and (3) study the integration of deep networks and graphs for learning maps between structured datasets.
This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Classical data analysis methods were based on mathematical, physical, or statistical models for the data-generation process, which were developed under the assumption that the data were relatively clean and collected for a specific task. Over the past few decades, advances in data acquisition have led to massive, noisy, high-dimensional datasets, which were not necessarily collected for a specific task. This has led to the emergence of data-driven methods, such as deep learning, which use massive amounts of labeled data to learn 'black-box' models that do not provide an explicit description of the process being modeled. Such data-driven methods have led to dramatic improvements in the performance of pattern recognition systems for applications in computer vision and speech recognition for which massive amounts of labeled data can be generated. However, existing models are not very interpretable, and their predictions are not robust to adversarial perturbations. Moreover, there are many applications in science and engineering where data labeling is extremely costly, and the ability to interpret model predictions and produce estimates of uncertainty is essential.
To address these challenges, this TRIPODS project supported the creation of the Mathematical Institute for Data Science (MINDS) at the Johns Hopkins University. The institute brings together a multidisciplinary team of mathematicians, statisticians, theoretical computer scientists, and electrical engineers with expertise in the foundations of machine learning, deep learning, statistical learning and inference on graphs, optimization, approximation theory, signal processing, dynamical systems and controls, to develop the foundations for the next generation of data-analysis methods, which will integrate model-based and data-driven approaches. In particular, the institute focuses on studying the foundations of deep neural models (e.g., feedforward networks, recurrent networks, generative adversarial networks) and generative models of structured data (e.g., graphical models, random graphs, dynamical systems), with the ultimate goal of arriving at integrated models that are more interpretable, robust to perturbations, and learnable with minimal supervision. In addition, the institute fosters interactions among data scientists through a monthly seminar series, annual research symposium, organization of conferences and workshops. The institute also helped create a new MSc in Data Science program at the Johns Hopkins University.
Last Modified: 04/13/2024
Modified by: Rene Vidal
Please report errors in award information by writing to: awardsearch@nsf.gov.