Award Abstract # 1829740
CyberTraining: CIU: The LSST Data Science Fellowship Program

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: NORTHWESTERN UNIVERSITY
Initial Amendment Date: June 29, 2018
Latest Amendment Date: June 29, 2018
Award Number: 1829740
Award Instrument: Standard Grant
Program Manager: Ashok Srinivasan
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2018
End Date: July 31, 2023 (Estimated)
Total Intended Award Amount: $499,251.00
Total Awarded Amount to Date: $499,251.00
Funds Obligated to Date: FY 2018 = $499,251.00
History of Investigator:
  • Adam Miller (Principal Investigator)
    amiller@northwestern.edu
Recipient Sponsored Research Office: Northwestern University
633 CLARK ST
EVANSTON
IL  US  60208-0001
(312)503-7955
Sponsor Congressional District: 09
Primary Place of Performance: Northwestern University
2145 Sheridan Road
Evanston
IL  US  60208-3112
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): EXZVPWZBLUE8
Parent UEI:
NSF Program(s): CyberTraining - Training-based,
SPECIAL PROGRAMS IN ASTRONOMY
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 062Z, 1207, 7361, 9179
Program Element Code(s): 044Y00, 121900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This National Science Foundation (NSF) Training-based Workforce Development for Advanced Cyberinfrastructure award supplements graduate education in astronomy by providing in-depth training in the skills necessary to make scientific discoveries using big data. Ongoing and future surveys, such as the NSF's flagship optical telescope project, the Large Synoptic Survey Telescope (LSST), are producing data at an unprecedented rate. The sheer size of these data sets requires new working practices: sophisticated computational software and data mining procedures are necessary to fully exploit the rich information present in the data. However, these skills are not typically a core component of the astronomy and astrophysics graduate curriculum. The LSST Data Science Fellowship Program (DSFP) supplements traditional educational programs by training students in a variety of data science methods to work with and ultimately analyze big data. DSFP students are selected from a wide variety of universities using an innovative admissions procedure that increases the participation of students from underrepresented groups. Furthermore, DSFP students are trained in science communication and receive a certification in teaching data science so they can tutor peers and lead training workshops in the material learned as part of the program. The project serves the national interest, as stated by the National Science Foundation's mission: to promote the progress of science, by training the next generation of astronomers to have the computing skills necessary to derive scientific insights from the largest telescopic surveys that have ever been conducted.

DSFP students attend six week-long sessions over the course of two years as part of their program training. Each session is hosted by a different institution and designed to focus on a single topic including: the basics of managing and building code, statistics, machine learning, scalable programming, data management, image processing, visualization, and science communication. This curriculum empowers trainees to ask broader questions of their data, prepares them for the technical challenges associated with LSST, and exposes them to the tools and methods necessary to advance fundamental science research. Student participants spread the adoption of data science tools, methods, and resources via the aforementioned teaching workshops, fostering new pathways to discovery in the broader research community. Students must work in collaborative groups, which in conjunction with their science communication training, enhances their leadership and mentoring skills. To reach a broad audience, all materials developed as part of the program are made available to the public, and a guide to convert the material into a semester-long course at the undergraduate or graduate level is provided. This program prepares students for success in a wide range of careers, providing education in data science methodologies, domain-specific
considerations, and professional skill development in research, teaching, and communication.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Vera C. Rubin Observatory is the flagship ground-based telescope project for the National Science Foundation (NSF) over the next ~decade. The Rubin Observatory is currently being constructed in Chile, with operations set to begin in 2025. The Legacy Survey of Space and Time (LSST), to be conducted by the Rubin Observatory during its first decade of operations, will image the entire visible night sky to incredible depth every few nights. LSST will produce a massive, unprecedented data set. The traditional graduate curriculum in astronomy and astrophysics does not include courses that cover modern data science methods, which are necessary to handle the data volume that will be produced by the Rubin Observatory and other future astronomical surveys. This Training-based Workforce Development for Advanced Cyberinfrastructure award provided continued funding for the LSST Data Science Fellowship Program (DSFP), which supplements graduate education in astronomy by providing in-depth training in the skills necessary to make scientific discoveries using big data.

The DSFP was established in 2016 as a pilot program to train graduate students in data science methods that are essential for the analysis of petabyte-scale data sets from LSST and other upcoming surveys. NSF funding enabled the continuation of the DSFP beyond its initial pilot program status (NSF funding provided student support from 2018-2023). Three new cohorts of students were admitted during this time, with 15 new fellows admitted in each of 2018, 2019, and 2021 (no students were admitted during 2020 due to the onset of the COVID-19 pandemic).

Students admitted to the DSFP participate in six separate training sessions, each one week long, over the course of 2 years. Each session focuses on a single data-science topic, while building on lessons from prior sessions, with the goal of seeding experts in data science throughout the astronomy and astrophysics community. Major themes for the individual sessions include: software engineering, scalable programming, and databases; statistics; data visualization; machine learning; image processing; and time-series analysis. This curriculum empowers trainees to ask broader questions of their data, prepares them for the technical challenges associated with LSST, and exposes them to the tools and methods necessary to advance fundamental science research. This is done via hands-on exercises where the students analyze real astronomical data using state of the art software, including in some instances the software being developed for the analysis of Rubin observations.

In addition to learning about data science methods, students are trained in science communication, thus empowering them to tutor their peers (and often, their advisors) at their home institutions in DSFP material. With our working emphasis on pair coding, students develop the skills to form collaborations, setting them up to be future leaders in the field.

All materials developed as part of the DSFP are made freely available to the public. This program prepares students for success in a wide range of careers, providing education in data science methodologies, domain-specific considerations, and professional skill development in research, teaching, and communication. Of the 45 fellows supported by the NSF, 18 of these students have completed their PhD while another 23 are still working towards graduation. Of the 18 that have graduated, 14 currently hold postdoctoral positions while another two are working in astronomy, but not as postdocs. Three of the fellows that are no longer working towards a PhD have industry jobs, while the other is working in education. During their time in the program (i.e., for the ~two year period when they were fellows), the 45 NSF-supported students contributed to 253 refereed publications, and 43 of these were first author papers.

The DSFP has had a broad impact, not only via the direct training of 45 fellows across three cohorts, but also through the proliferation of data science knowledge throughout the astronomy and astrophysics community, both via training seminars taught by our students, and the dissemination of our learning materials. The DSFP has served the national interest, as stated by the NSF's mission: to promote the progress of science, by training the next generation of astronomers to have the computing skills necessary to derive scientific insights from the largest telescopic surveys that have ever been conducted.


Last Modified: 10/27/2023
Modified by: Adam A Miller

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page