Award Abstract # 2217069
Collaborative Research: EnCORE: Institute for Emerging CORE Methods in Data Science

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF TEXAS AT AUSTIN
Initial Amendment Date: July 28, 2022
Latest Amendment Date: September 16, 2024
Award Number: 2217069
Award Instrument: Continuing Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2022
End Date: August 31, 2027 (Estimated)
Total Intended Award Amount: $2,572,294.00
Total Awarded Amount to Date: $2,302,193.00
Funds Obligated to Date: FY 2022 = $986,186.00
FY 2023 = $1,045,905.00

FY 2024 = $270,102.00
History of Investigator:
  • Sujay Sanghavi (Principal Investigator)
    sanghavi@mail.utexas.edu
  • Shuchi Chawla (Co-Principal Investigator)
  • Rachel Ward (Co-Principal Investigator)
  • Purnamrita Sarkar (Co-Principal Investigator)
  • Rachel Ward (Former Principal Investigator)
  • Sujay Sanghavi (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Texas at Austin
110 INNER CAMPUS DR
AUSTIN
TX  US  78712-1139
(512)471-6424
Sponsor Congressional District: 25
Primary Place of Performance: University of Texas at Austin
Austin
TX  US  78759-5316
Primary Place of Performance
Congressional District:
37
Unique Entity Identifier (UEI): V6AFQPN18437
Parent UEI:
NSF Program(s): TRIPODS Transdisciplinary Rese,
HDR-Harnessing the Data Revolu
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01002425DB NSF RESEARCH & RELATED ACTIVIT

01002526DB NSF RESEARCH & RELATED ACTIVIT

01002627DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 048Z, 062Z, 075Z, 079Z, 9102
Program Element Code(s): 041Y00, 099Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.041, 47.049, 47.070

ABSTRACT

The proliferation of data-driven decision making, and its increased popularity, has fueled rapid emergence of data science as a new scientific discipline. Data science is seen as a key enabler of future businesses, technologies, and healthcare that can transform all aspects of socioeconomic lives. Its fast adoption, however, often comes with ad hoc implementation of techniques with suboptimal, and sometimes unfair and potentially harmful, results. The time is ripe to develop principled approaches to lay solid foundations of data science. This is particularly challenging as real-world data is highly complex with intricate structures, unprecedented scale, rapidly evolving characteristics, noise, and implicit biases. Addressing these challenges requires a concerted effort across multiple scientific disciplines such as statistics for robust decision making under uncertainty; mathematics and electrical engineering for enabling data-driven optimization beyond worst case; theoretical computer science and machine learning for new algorithmic paradigms to deal with dynamic and sensitive data in an ethical way; and basic sciences to bring the technical developments to the forefront of health sciences and society. The proposed institute for emerging CORE methods in data science (EnCORE) brings together a diverse team of researchers spanning the afore-mentioned disciplines from the University of California San Diego, University of Texas Austin, University of Pennsylvania, and the University of California Los Angeles. It presents an ambitious vision to transform the landscape of the four CORE pillars of data science: C for complexities of data, O for optimization, R for responsible learning, and E for education and engagement. Along with its transformative research vision, the institute fosters a bold plan for outreach and broadening participation by engaging students of diverse backgrounds at all levels from K-12 to postdocs and junior faculty. The project aims to impact a wide demography of students by offering collaborative courses across its partner universities and a flexible co-mentorship plan for truly multidisciplinary research. With regular organization of workshops, summer schools, and seminars, the project aims to engage the entire scientific community to become the new nexus of research and education on foundations of data science. To bring the fruit of theoretical development to practice, EnCORE will continuously work with industry partners, domain scientists, and will forge strong connections with other National Science Foundation Harnessing Data Revolution institutes across the nation.

EnCORE as an institute embodies intellectual merit that has the potential to lead ground-breaking research to shape the foundations of data science in the United States. Its research mission is organized around three themes. The first theme on data complexity addresses the complex characteristics of data such as massive size, huge feature space, rapid changes, variety of sources, implicit dependence structures, arbitrary outliers, and noise. A major overhaul of the core concepts of algorithm design is needed with a holistic view of different computational complexity measures. Faced with noise and outliers, uncertainty estimation is both necessary, and at the same time difficult, due to dynamic and changing data. Data heterogeneity poses major challenges even in basic classification tasks. The structural relationships hidden inside such data are crucial in the understanding and processing, and for downstream data analysis tasks such as in visualization and neuroscience. The second theme of EnCORE aims to transform the classical area of optimization where adaptive methods and human intervention can lead to major advances. It plans to revisit the foundations of distributed optimization to include heterogeneity, robustness, safety, and communication; and address statistical uncertainty due to distributional shift in dynamic data in control and reinforcement learning. The third and final theme of EnCORE proposes to build the foundations of responsible learning. Applications of machine learning in human-facing systems are severely hampered when the learned models are hard for users to understand and reproduce, may give biased outcomes, are easily changeable by an adversary, and reveal sensitive information. Thus, interpretability, reproducibility, fairness, privacy, and robustness must be incorporated in any data-driven decision making. The experience and dedication to mentoring and outreach, collaborative curriculum design, socially aware responsible research program, extensive institute activities, and industrial partnerships would pave the way for a substantial broader impact for EnCORE. Summer schools with year-long mentoring will take place in three states involving a large demography. Joint courses with hybrid, and fully online offerings will be developed. Utilizing prior experience of running Thinkabit lab that has impacted over 74,000 K-12 students so far, EnCORE will embark on an ambitious and thoughtful outreach program to improve the representation of under-represented groups and help create a future generation of workforce that is diverse, responsible, and has solid foundations in data science.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chawla, S. and Rezvan, R. and Teng, Y. and Tzamos, C. "Buy Many Mechanisms for Many Unit Demand Buyers" Conference on Web and Internet Economics , 2023 https://doi.org/10.1007/978-3-031-48974-7_2 Citation Details
Chawla, Shuchi and Christou, Dimitrios "Online Time Windows TSP with Predictions" , 2024 Citation Details
Chawla, Shuchi and Gergatsouli, Evangelia and McMahan, Jeremy and Tzamos, Christos "Approximating Pandora's Box with Correlations" International Conference on Approximation Algorithms for Combinatorial Optimization Problems , 2023 Citation Details
Chawla, Shuchi and Sheridan, Kristin "Composition of nested embeddings with an application to outlier removal" , 2024 Citation Details
Jalan, Akhil and Chakrabarti, Deepayan and Sarkar, Purnamrita "Incentive-Aware Models of Financial Networks" Operations Research , 2024 https://doi.org/10.1287/opre.2022.0678 Citation Details
Kumar, Syamantak and Sarkar, Purnamrita "Streaming PCA for Markovian Data" , 2024 Citation Details
Raoof, Negin and Rout, Litu and Daras, Giannis and Sanghavi, Sujay and Caramanis, Constantine and Shakkottai, Sanjay and Dimakis, Alex "Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models" , 2025 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page