Skip to feedback

Award Abstract # 1633631
NRT-DESE: Team Science for Integrative Graduate Training in Data Science and Physical Science

NSF Org: DGE
Division Of Graduate Education
Recipient: UNIVERSITY OF CALIFORNIA IRVINE
Initial Amendment Date: September 9, 2016
Latest Amendment Date: September 9, 2016
Award Number: 1633631
Award Instrument: Standard Grant
Program Manager: Vinod Lohani
DGE
 Division Of Graduate Education
EDU
 Directorate for STEM Education
Start Date: September 15, 2016
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $2,967,150.00
Total Awarded Amount to Date: $2,967,150.00
Funds Obligated to Date: FY 2016 = $2,967,150.00
History of Investigator:
  • Padhraic Smyth (Principal Investigator)
    smyth@ics.uci.edu
  • Pierre Baldi (Co-Principal Investigator)
  • James Randerson (Co-Principal Investigator)
  • Daniel Whiteson (Co-Principal Investigator)
  • Maritza Campo (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Irvine
160 ALDRICH HALL
IRVINE
CA  US  92697-0001
(949)824-7295
Sponsor Congressional District: 47
Primary Place of Performance: University of California-Irvine
4216 Bren Hall
Irvine
CA  US  92697-0001
Primary Place of Performance
Congressional District:
47
Unique Entity Identifier (UEI): MJC5FCYQTPE6
Parent UEI: MJC5FCYQTPE6
NSF Program(s): NSF Research Traineeship (NRT)
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
04001617DB NSF Education & Human Resource
Program Reference Code(s): 026Z, 7433, 9179, SMET
Program Element Code(s): 199700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.076

ABSTRACT

Massively parallel computers simulate data about molecular phenomena at previously unimaginable scales, satellites scan the planet capturing vast sets of measurements about ecosystem health, and particle accelerators generate tremendous amounts of data revealing fundamental properties of the smallest building blocks of matter; all with potentially broad societal benefits in areas such as drug discovery, energy conservation, and materials science. To fully realize these benefits will require a workforce with the technical skills to extract useful information from massive scientific data sets, calling for new approaches to graduate student training that emphasize expertise in data-driven science. This National Science Foundation Research Traineeship (NRT) award to the University of California Irvine (UCI) will tackle this challenge by creating a training ecosystem comprised of leading UCI, national-laboratory, and private-sector researchers across particle physics, earth science, chemistry, statistics and machine learning; all bound together by expertise in the emerging Science of Team Science. The project anticipates training over sixty (60) MS and PhD students, including twenty (20) funded trainees, from diverse backgrounds in computational statistics, machine learning, earth science, particle physics, synthetic chemistry, and team science. After graduation, students from this program will have both the technical and team-science skills to be leaders in the emerging field of data-driven science, and to participate in and lead interdisciplinary research teams at national laboratories, in academia, and in industry labs.

The research agenda of the program seeks to create the foundation from which bridges can be built between the traditional scientific route of building interpretable models based on physical principles and data-driven modeling approaches that can provide high fidelity predictions but may lack clear interpretability in terms of the underlying science. The program will involve a number of interrelated research themes across multiple disciplines in the information and physical sciences, including machine learning (e.g. temporal and spatial data modeling, multi-scale models, deep learning, and scalable learning algorithms), particle and astroparticle physics (e.g. accelerator based experiments), earth systems science (e.g. reducing ecosystem response prediction uncertainties), and chemistry (e.g. prediction of physical properties of small molecules). A significant aspect of the program is an emphasis on team science as a core theme. Students will collaborate in small interdisciplinary research teams consisting of students and faculty with different disciplinary skills, and will take part in team-science workshops leading to student-led development of a team-science certificate in years 3 to 5 of the program. Summer internships for student participants, at both national and industry research laboratories, will serve to reinforce the students' academic training via participation in large-scale interdisciplinary data science research projects.

The NSF Research Traineeship (NRT) Program is designed to encourage the development and implementation of bold, new potentially transformative models for STEM graduate education training. The Traineeship Track is dedicated to effective training of STEM graduate students in high priority interdisciplinary research areas, through the comprehensive traineeship model that is innovative, evidence-based, and aligned with changing workforce and research needs.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 32)
Collado, Julian and Bauer, Kevin and Witkowski, Edmund and Faucett, Taylor and Whiteson, Daniel and Baldi, Pierre "Learning to isolate muons" Journal of High Energy Physics , v.2021 , 2021 https://doi.org/10.1007/JHEP10(2021)200 Citation Details
Beucler, Tom and Pritchard, Michael and Rasp, Stephan and Ott, Jordan and Baldi, Pierre and Gentine, Pierre "Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems" Physical Review Letters , v.126 , 2021 https://doi.org/10.1103/PhysRevLett.126.098302 Citation Details
Coffield, Shane R. and Graff, Casey A. and Chen, Yang and Smyth, Padhraic and Foufoula-Georgiou, Efi and Randerson, James T. "Machine learning to predict final fire size at the time of ignition" International Journal of Wildland Fire , v.28 , 2019 10.1071/WF19023 Citation Details
Collado, Julian and Howard, Jessica N. and Faucett, Taylor and Tong, Tony and Baldi, Pierre and Whiteson, Daniel "Learning to identify electrons" Physical Review D , v.103 , 2021 https://doi.org/10.1103/PhysRevD.103.116028 Citation Details
Faucett, Taylor and Thaler, Jesse and Whiteson, Daniel "Mapping machine-learned physics into a human-readable space" Physical Review D , v.103 , 2021 https://doi.org/10.1103/PhysRevD.103.036020 Citation Details
Graff, Casey A. and Coffield, Shane R. and Chen, Yang and Foufoula-Georgiou, Efi and Randerson, James T. and Smyth, Padhraic "Forecasting Daily Wildfire Activity Using Poisson Regression" IEEE Transactions on Geoscience and Remote Sensing , v.58 , 2020 https://doi.org/10.1109/TGRS.2020.2968029 Citation Details
Griffiths, Hannah M. and Eggleton, Paul and HemmingSchroeder, Nicole and Swinfield, Tom and Woon, Joel S. and Allison, Steven D. and Coomes, David A. and Ashton, Louise A. and Parr, Catherine L. "Carbon flux and forest dynamics: Increased deadwood decomposition in tropical rainforest treefall canopy gaps" Global Change Biology , v.27 , 2021 https://doi.org/10.1111/gcb.15488 Citation Details
Heidbrink, W W and Garcia, A and Boeglin, W and Salewski, M "Phase-space sensitivity (weight functions) of 3 MeV proton diagnostics" Plasma Physics and Controlled Fusion , v.63 , 2021 https://doi.org/10.1088/1361-6587/abeda0 Citation Details
Hertel, Lars and Collado, Julian and Sadowski, Peter and Ott, Jordan and Baldi, Pierre "Sherpa: Robust hyperparameter optimization for machine learning" SoftwareX , v.12 , 2020 https://doi.org/10.1016/j.softx.2020.100591 Citation Details
Jalalvand, Azarakhsh and Kaptanoglu, Alan A. and Garcia, Alvin V. and Nelson, Andrew O. and Abbate, Joseph and Austin, Max E. and Verdoolaege, Geert and Brunton, Steven L. and Heidbrink, William W. and Kolemen, Egemen "Alfvén eigenmode classification based on ECE diagnostics at DIII-D using deep recurrent neural networks" Nuclear Fusion , v.62 , 2021 https://doi.org/10.1088/1741-4326/ac3be7 Citation Details
Kadish, Dora and Mood, Aaron D. and Tavakoli, Mohammadamin and Gutman, Eugene S. and Baldi, Pierre and Van Vranken, David L. "Methyl Cation Affinities of Canonical Organic Functional Groups" The Journal of Organic Chemistry , v.86 , 2021 https://doi.org/10.1021/acs.joc.0c02327 Citation Details
(Showing: 1 - 10 of 32)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Modern scientific research is increasingly data-driven, with huge volumes of scientific data being generated by telescopes, particle accelerators, satellites, and more. As a result it is critically important that future generations of scienticts have expertise in analyzing data. As part of a five-year Machine Learning and Physical Sciences (MAPS) graduate training program at the University of California, Irvine, 51 PhD students working at the intersection of physical and information sciences were supported in their graduate research by funding from the National Science Foundation Research Traineeship program. In particular, students received training and mentoring with a specific emphasis on developing and harnessing new techniques from data science and machine learning for data-driven scientific discovery.

In the natural sciences, students from areas such as particle physics, climate science, and chemistry gained skills in topics such as machine learning, algorithms, and statistics; and engaged in graduate thesis projects with a significant data science component. On the information science side, students from computer science and statistics focused on particular disciplines within the physical sciences as the application area for their graduate thesis work. A significant additional aspect of the program was an emphasis on student communication and leadership skills in interdisciplinary team science.  


The program produced a cohort of graduate students who have the data science skills to explore new research directions at the interface of information science and physical sciences. In addition these students have both the technical and team-science skills to be leaders in the emerging field of data science, and to participate in and lead interdisciplinary research and engineering teams at national laboratories, in academia, and in private industry.  


From a research perspective, the program contributed to fundamental new knowledge in the scientific disciplines of particle and astrophysics, synthetic chemistry, and earth and climate science, In addition the program contributed to the development of new techniques and algorithms in machine learning and data science. The PhD students in the program produced over 50 peer-reviewed research papers on their research as well as a variety publicly-available open-source software and research datasets.


In addition, students in the program were actively engaged in community outreach activities during the five years of the program, particularly in terms of increasing awareness about university study and research careers among K-12 students in disadvantaged communities in Southern California.


Last Modified: 12/21/2021
Modified by: Padhraic Smyth

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page