Skip to feedback

Award Abstract # 2217062
Collaborative Research: EnCORE: Institute for Emerging CORE Methods in Data Science

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: TRUSTEES OF THE UNIVERSITY OF PENNSYLVANIA, THE
Initial Amendment Date: July 28, 2022
Latest Amendment Date: September 16, 2024
Award Number: 2217062
Award Instrument: Continuing Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2022
End Date: August 31, 2027 (Estimated)
Total Intended Award Amount: $1,839,905.00
Total Awarded Amount to Date: $1,558,384.00
Funds Obligated to Date: FY 2022 = $729,140.00
FY 2023 = $735,404.00

FY 2024 = $93,840.00
History of Investigator:
  • Hamed Hassani (Principal Investigator)
    hassani@seas.upenn.edu
  • George Pappas (Co-Principal Investigator)
  • Rajiv Gandhi (Co-Principal Investigator)
  • Eric Tchetgen Tchetgen (Co-Principal Investigator)
  • AARON ROTH (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Pennsylvania
3451 WALNUT ST STE 440A
PHILADELPHIA
PA  US  19104-6205
(215)898-7293
Sponsor Congressional District: 03
Primary Place of Performance: University of Pennsylvania
220 S 33rd Street
Philadelphia
PA  US  19104-6389
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): GM1XX56LEP58
Parent UEI: GM1XX56LEP58
NSF Program(s): TRIPODS Transdisciplinary Rese,
HDR-Harnessing the Data Revolu
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01002425DB NSF RESEARCH & RELATED ACTIVIT

01002526DB NSF RESEARCH & RELATED ACTIVIT

01002627DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 048Z, 062Z, 075Z, 079Z, 9102
Program Element Code(s): 041Y00, 099Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.041, 47.049, 47.070

ABSTRACT

The proliferation of data-driven decision making, and its increased popularity, has fueled rapid emergence of data science as a new scientific discipline. Data science is seen as a key enabler of future businesses, technologies, and healthcare that can transform all aspects of socioeconomic lives. Its fast adoption, however, often comes with ad hoc implementation of techniques with suboptimal, and sometimes unfair and potentially harmful, results. The time is ripe to develop principled approaches to lay solid foundations of data science. This is particularly challenging as real-world data is highly complex with intricate structures, unprecedented scale, rapidly evolving characteristics, noise, and implicit biases. Addressing these challenges requires a concerted effort across multiple scientific disciplines such as statistics for robust decision making under uncertainty; mathematics and electrical engineering for enabling data-driven optimization beyond worst case; theoretical computer science and machine learning for new algorithmic paradigms to deal with dynamic and sensitive data in an ethical way; and basic sciences to bring the technical developments to the forefront of health sciences and society. The proposed institute for emerging CORE methods in data science (EnCORE) brings together a diverse team of researchers spanning the afore-mentioned disciplines from the University of California San Diego, University of Texas Austin, University of Pennsylvania, and the University of California Los Angeles. It presents an ambitious vision to transform the landscape of the four CORE pillars of data science: C for complexities of data, O for optimization, R for responsible learning, and E for education and engagement. Along with its transformative research vision, the institute fosters a bold plan for outreach and broadening participation by engaging students of diverse backgrounds at all levels from K-12 to postdocs and junior faculty. The project aims to impact a wide demography of students by offering collaborative courses across its partner universities and a flexible co-mentorship plan for truly multidisciplinary research. With regular organization of workshops, summer schools, and seminars, the project aims to engage the entire scientific community to become the new nexus of research and education on foundations of data science. To bring the fruit of theoretical development to practice, EnCORE will continuously work with industry partners, domain scientists, and will forge strong connections with other National Science Foundation Harnessing Data Revolution institutes across the nation.

EnCORE as an institute embodies intellectual merit that has the potential to lead ground-breaking research to shape the foundations of data science in the United States. Its research mission is organized around three themes. The first theme on data complexity addresses the complex characteristics of data such as massive size, huge feature space, rapid changes, variety of sources, implicit dependence structures, arbitrary outliers, and noise. A major overhaul of the core concepts of algorithm design is needed with a holistic view of different computational complexity measures. Faced with noise and outliers, uncertainty estimation is both necessary, and at the same time difficult, due to dynamic and changing data. Data heterogeneity poses major challenges even in basic classification tasks. The structural relationships hidden inside such data are crucial in the understanding and processing, and for downstream data analysis tasks such as in visualization and neuroscience. The second theme of EnCORE aims to transform the classical area of optimization where adaptive methods and human intervention can lead to major advances. It plans to revisit the foundations of distributed optimization to include heterogeneity, robustness, safety, and communication; and address statistical uncertainty due to distributional shift in dynamic data in control and reinforcement learning. The third and final theme of EnCORE proposes to build the foundations of responsible learning. Applications of machine learning in human-facing systems are severely hampered when the learned models are hard for users to understand and reproduce, may give biased outcomes, are easily changeable by an adversary, and reveal sensitive information. Thus, interpretability, reproducibility, fairness, privacy, and robustness must be incorporated in any data-driven decision making. The experience and dedication to mentoring and outreach, collaborative curriculum design, socially aware responsible research program, extensive institute activities, and industrial partnerships would pave the way for a substantial broader impact for EnCORE. Summer schools with year-long mentoring will take place in three states involving a large demography. Joint courses with hybrid, and fully online offerings will be developed. Utilizing prior experience of running Thinkabit lab that has impacted over 74,000 K-12 students so far, EnCORE will embark on an ambitious and thoughtful outreach program to improve the representation of under-represented groups and help create a future generation of workforce that is diverse, responsible, and has solid foundations in data science.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Acharya, Krishna and Arunachaleswaran, Eshwar Ram and Kannan, Sampath and Roth, Aaron and Ziani, Juba "Wealth Dynamics Over Generations: Analysis and Interventions" IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) , 2023 https://doi.org/10.1109/SaTML54575.2023.00013 Citation Details
Bastani, Osbert and Gupta, Varun and Jung, Christopher and Noarov, Georgy and Ramalingam, Ramya and Roth, Aaron "Practical Adversarial Multivalid Conformal Prediction" Advances in neural information processing systems , 2022 Citation Details
Bechavod, Yahav and Roth, Aaroh "Individually Fair Learning with One Sided Feedback" International Conference on Machine Learning , 2023 Citation Details
Dick, Travis and Dwork, Cynthia and Kearns, Michael and Liu, Terrance and Roth, Aaron and Vietri, Giuseppe and Wu, Zhiwei Steven "Confidence-ranked reconstruction of census microdata from published statistics" Proceedings of the National Academy of Sciences , v.120 , 2023 https://doi.org/10.1073/pnas.2218605120 Citation Details
Globus-Harris, Ira and Harrison, Declan and Kearns, Michael and Roth, Aaron and Sorrell, Jessica "Multicalibration as Boosting for Regression" International Conference on Machine Learning , 2023 Citation Details
Jung, Christopher and Noarov, Georgy and Ramalingam, Ramya and Roth, Aaron "Batch Multivalid Conformal Prediction" International Conference on Learning Representations (ICLR) , 2023 Citation Details
Lee, Daniel and Noarov, Goergy and Pai, Mallesh and Roth, Aaron "Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications" Advances in neural information processing systems , 2022 Citation Details
Noarov, Georgy and Roth, Aarom "The Statistical Scope of Multicalibration" , 2023 Citation Details
Noarov, Georgy and Roth, Aaron "The Statistical Scope of Multicalibration" International Conference on Machine Learning , 2023 Citation Details
Roth, Aaron and Tolbert, Alexander and Weinstein, Scott "Reconciling Individual Probability Forecasts" ACM Conference on Fairness Accountability and Transparency , 2023 https://doi.org/10.1145/3593013.3593980 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page