Award Abstract # 2217033
Collaborative Research: EnCORE: Institute for Emerging CORE Methods in Data Science

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Initial Amendment Date: July 28, 2022
Latest Amendment Date: September 16, 2024
Award Number: 2217033
Award Instrument: Continuing Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2022
End Date: August 31, 2027 (Estimated)
Total Intended Award Amount: $948,911.00
Total Awarded Amount to Date: $852,940.00
Funds Obligated to Date: FY 2022 = $365,413.00
FY 2023 = $391,556.00

FY 2024 = $95,971.00
History of Investigator:
  • Raghu Meka (Principal Investigator)
    raghum@cs.ucla.edu
  • Alyson Fletcher (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Los Angeles
10889 WILSHIRE BLVD STE 700
LOS ANGELES
CA  US  90024-4200
(310)794-0102
Sponsor Congressional District: 36
Primary Place of Performance: Uniersity of California Los Angeles
404 Westwood Plaza
Los Angeles
CA  US  90095-8357
Primary Place of Performance
Congressional District:
36
Unique Entity Identifier (UEI): RN64EPNH8JC6
Parent UEI:
NSF Program(s): TRIPODS Transdisciplinary Rese,
HDR-Harnessing the Data Revolu
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01002425DB NSF RESEARCH & RELATED ACTIVIT

01002526DB NSF RESEARCH & RELATED ACTIVIT

01002627DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 048Z, 062Z, 075Z, 079Z, 9102
Program Element Code(s): 041Y00, 099Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.041, 47.049, 47.070

ABSTRACT

The proliferation of data-driven decision making, and its increased popularity, has fueled rapid emergence of data science as a new scientific discipline. Data science is seen as a key enabler of future businesses, technologies, and healthcare that can transform all aspects of socioeconomic lives. Its fast adoption, however, often comes with ad hoc implementation of techniques with suboptimal, and sometimes unfair and potentially harmful, results. The time is ripe to develop principled approaches to lay solid foundations of data science. This is particularly challenging as real-world data is highly complex with intricate structures, unprecedented scale, rapidly evolving characteristics, noise, and implicit biases. Addressing these challenges requires a concerted effort across multiple scientific disciplines such as statistics for robust decision making under uncertainty; mathematics and electrical engineering for enabling data-driven optimization beyond worst case; theoretical computer science and machine learning for new algorithmic paradigms to deal with dynamic and sensitive data in an ethical way; and basic sciences to bring the technical developments to the forefront of health sciences and society. The proposed institute for emerging CORE methods in data science (EnCORE) brings together a diverse team of researchers spanning the afore-mentioned disciplines from the University of California San Diego, University of Texas Austin, University of Pennsylvania, and the University of California Los Angeles. It presents an ambitious vision to transform the landscape of the four CORE pillars of data science: C for complexities of data, O for optimization, R for responsible learning, and E for education and engagement. Along with its transformative research vision, the institute fosters a bold plan for outreach and broadening participation by engaging students of diverse backgrounds at all levels from K-12 to postdocs and junior faculty. The project aims to impact a wide demography of students by offering collaborative courses across its partner universities and a flexible co-mentorship plan for truly multidisciplinary research. With regular organization of workshops, summer schools, and seminars, the project aims to engage the entire scientific community to become the new nexus of research and education on foundations of data science. To bring the fruit of theoretical development to practice, EnCORE will continuously work with industry partners, domain scientists, and will forge strong connections with other National Science Foundation Harnessing Data Revolution institutes across the nation.

EnCORE as an institute embodies intellectual merit that has the potential to lead ground-breaking research to shape the foundations of data science in the United States. Its research mission is organized around three themes. The first theme on data complexity addresses the complex characteristics of data such as massive size, huge feature space, rapid changes, variety of sources, implicit dependence structures, arbitrary outliers, and noise. A major overhaul of the core concepts of algorithm design is needed with a holistic view of different computational complexity measures. Faced with noise and outliers, uncertainty estimation is both necessary, and at the same time difficult, due to dynamic and changing data. Data heterogeneity poses major challenges even in basic classification tasks. The structural relationships hidden inside such data are crucial in the understanding and processing, and for downstream data analysis tasks such as in visualization and neuroscience. The second theme of EnCORE aims to transform the classical area of optimization where adaptive methods and human intervention can lead to major advances. It plans to revisit the foundations of distributed optimization to include heterogeneity, robustness, safety, and communication; and address statistical uncertainty due to distributional shift in dynamic data in control and reinforcement learning. The third and final theme of EnCORE proposes to build the foundations of responsible learning. Applications of machine learning in human-facing systems are severely hampered when the learned models are hard for users to understand and reproduce, may give biased outcomes, are easily changeable by an adversary, and reveal sensitive information. Thus, interpretability, reproducibility, fairness, privacy, and robustness must be incorporated in any data-driven decision making. The experience and dedication to mentoring and outreach, collaborative curriculum design, socially aware responsible research program, extensive institute activities, and industrial partnerships would pave the way for a substantial broader impact for EnCORE. Summer schools with year-long mentoring will take place in three states involving a large demography. Joint courses with hybrid, and fully online offerings will be developed. Utilizing prior experience of running Thinkabit lab that has impacted over 74,000 K-12 students so far, EnCORE will embark on an ambitious and thoughtful outreach program to improve the representation of under-represented groups and help create a future generation of workforce that is diverse, responsible, and has solid foundations in data science.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 18)
Abboud, Amir and Fischer, Nick and Kelley, Zander and Lovett, Shachar and Meka, Raghu "New Graph Decompositions and Combinatorial Boolean Matrix Multiplication Algorithms" , 2024 https://doi.org/10.1145/3618260.3649696 Citation Details
Awasthi, Pranjal and Dikkala, Nishant and Kamath, Pritish and Meka, Raghu "Learning Neural Networks with Sparse Activations" Journal of machine learning research , 2024 Citation Details
Azar, Golara Ahmadi and Emami, Melika and Fletcher, Alyson and Rangan, Sundeep "Learning Embedding Representations in High Dimensions" , 2024 https://doi.org/10.1109/CISS59072.2024.10480173 Citation Details
Azar, Golara Ahmadi and Hu, Qin and Emami, Melika and Fletcher, Alyson and Rangan, Sundeep and Atashzar, S Farokh "A Deep Learning Sequential Decoder for Transient High-Density Electromyography in Hand Gesture Recognition Using Subject-Embedded Transfer Learning" IEEE Sensors Journal , v.24 , 2024 https://doi.org/10.1109/JSEN.2024.3377247 Citation Details
Badih Ghazi, Pritish Kamath "On User-Level Private Convex Optimization" International Conference on Machine Learning , 2023 Citation Details
Becker, Evan and Pandit, Parthe and Rangan, Sundeep and Fletcher, Alyson K "Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks" , 2023 https://doi.org/10.1109/IEEECONF59524.2023.10476957 Citation Details
Chandrasekaran, Gautam and Klivans, Adam and Kontonis, Vasilis and Meka, Raghu and Stavropoulos, Konstantinos "Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension" Journal of machine learning research , 2024 Citation Details
Ghazi, Badih and Kamath, Pritish and Kumar, Ravi and Manurangsi, Pasin and Meka, Raghu and Zhang, Chiyuan "On Convex Optimization with Semi-Sensitive Features" Journal of machine learning research , 2024 Citation Details
Ghazi, Badih and Kamath, Pritish and Kumar, Ravi and Manurangsi, Pasin and Meka, Raghu and Zhang, Chiyuan "User-Level Differential Privacy With Few Examples Per User" Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023 , 2023 Citation Details
Hu, Qin and Azar, Golara Ahmadi and Fletcher, Alyson and Rangan, Sundeep and Atashzar, S Farokh "ViT-MDHGR: Cross-day Reliability and Agility in Dynamic Hand Gesture Prediction via HD-sEMG Signal Decoding" IEEE Journal of Selected Topics in Signal Processing , 2024 https://doi.org/10.1109/JSTSP.2024.3402340 Citation Details
Jonathan Kelner, Frederic Koehler "Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs" Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , v.35 , 2022 Citation Details
(Showing: 1 - 10 of 18)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page