Award Abstract # 1550476
Collaborative Research: SI2-SSI: Swift/E: Integrating Parallel Scripted Workflow into the Scientific Software Ecosystem

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF NOTRE DAME DU LAC
Initial Amendment Date: September 13, 2016
Latest Amendment Date: September 13, 2016
Award Number: 1550476
Award Instrument: Standard Grant
Program Manager: Amy Walton
awalton@nsf.gov
 (703)292-4538
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2016
End Date: September 30, 2019 (Estimated)
Total Intended Award Amount: $81,000.00
Total Awarded Amount to Date: $81,000.00
Funds Obligated to Date: FY 2016 = $81,000.00
History of Investigator:
  • Mitchell Wayne (Principal Investigator)
    mwayne@nd.edu
Recipient Sponsored Research Office: University of Notre Dame
940 GRACE HALL
NOTRE DAME
IN  US  46556-5708
(574)631-7432
Sponsor Congressional District: 02
Primary Place of Performance: University of Notre Dame
940 Grace Hall
Notre Dame
IN  US  46556-5708
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): FPU6XGFXMBE9
Parent UEI: FPU6XGFXMBE9
NSF Program(s): Software Institutes
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8004, 8009, 9102
Program Element Code(s): 800400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Science and engineering research increasingly relies on repeated execution of a complex series of steps (i.e., workflows) to form hypotheses; conduct experiments; analyze results; and refine theory. Computation is often essential throughout the workflow and in this case, software can improve productivity by managing the computational and data workflow. Swift is one such open-source workflow system that has been developed and widely used in diverse areas ranging from materials simulations and climate modeling to neuroscience and genomics. This project extends the capabilities of Swift by integrating it with other software systems that enable collaboration, usability, maintainability, and productivity. The new ecosystem, Swift/E, will enable scientists and engineers to more productively create and run computational workflow campaigns of larger scale, and debug, execute, adapt, and disseminate them faster and easier than has been possible to date. These workflows embody and communicate the computational methods specific to each domain of scientific inquiry. Swift/E achieves community engagement and extensive productivity benefits for a large user community through an integrated program of research, education, and software dissemination. The project engages and serves science and engineering communities by creating patterns of practice for building and sharing reusable workflow libraries, and by training students, educators, and researchers in their use. To advance the education of the next generation of computationally trained scientists, Swift/E powers a network of NSF-supported "e-Labs" that teach the concepts of collaborative parallel computational science at high school and undergraduate levels, reaching over a thousand students annually.

The open-source Swift/E "ecosystem" integrates Swift with several scientific software elements that play a major role in the national and global cyberinfrastructure of today. These elements are: Swift for the parallel scripting of scientific workflow; Globus for data cataloging, management, and high-speed wide-area transport; the Web-based Galaxy workflow portal for workflow composition, execution, and collaborative sharing; Jupyter for the interactive development, testing, debugging, and assembly of high level programming and workflow languages; Python and R for productively expressing high-level computational logic; and "git" and related tools and Web portals for revision control, code dissemination and sharing, and for the collaborative engagement of developers. Swift's implicitly parallel programming language is minimal and compact. Swift provides a facility for embedding other scripting languages (currently Python, R, Julia and Tcl) into its runtime environment. This project merges newer extreme-scale "Swift/T" capabilities with the flexible and portable original "Swift/K" version to make the core Swift/E software element more powerful and flexible while lowering it?s ongoing support cost. Swift/E enhances usability by extending Swift's troubleshooting and inter-language integration facilities. And with enhanced and innovative workflow sharing archives, new training materials, and a sustained program for user support and self-sustaining and expanding community engagement, the Swift/E project engages, supports, and sustains a large global science and engineering user base.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

As part of its mission to promote professional development among America's high school science educators, QuarkNet routinely provides educational materials to high school science teachers designed to enhance their lesson plans with guided inquiries into real scientific experiments and research projects.  Working with the Parsl group, we prepared a set of interactive, web-based notebooks that guide students through the analysis of cosmic ray muon data collected by QuarkNet's network of researchers and educators.  These notebooks combine important lessons on the physics of cosmic rays with insight into the computational methods and tools used to draw sound, data-based conclusions from their study.

Intellectual Merits

Contemporary scientific research is inseparable from advanced computation and data analysis, making it increasingly vital for high school science educators to be able to form connections between the important scientific concepts they teach and the data analysis skillsets needed to apply them in real-world contexts.  Without these connections, students may find themselves underprepared for careers in scientific inquiry and for civic participation in science-based public policy discussions.

QuarkNet helped create and currently maintains a set of webapps called e-Labs that take real data from contemporary scientific experiments and present it in a form tailored for guided analysis in the high school classroom.  These e-Labs focus solely on science learning goals, and they lack any components dedicated to programming or data science education.  Parsl, a Python library for parallelizing scientific code and managing data analysis workflows, allows us to introduce just such a component in a format amenable to illuminating basic principles of scientific programming and data science that are appropriate to the high school science curriculum.

Over the lifetime of the partnership between QuarkNet and Parsl, we expanded one of the cosmic-ray-based data analyses of the QuarkNet e-Labs into a set of Jupyter Notebooks that retain the e-Labs' lessons on the physics of cosmic rays while providing additional insight into the data analysis code and data manipulation techniques employed by it.  The notebooks further demonstrate to students how these tools and techniques are generalized by researchers studying real physics data.

In creating the notebooks, we collaborated with teachers and a mentor within QuarkNet's network of educators in order to ensure that the Notebooks address high-school level learning goals.  We tested their use with a group of high school students performing summer research at the Notre Dame QuarkNet center.  The results of these efforts indicate that the Notebooks are appropriate for use with high-school level activities such as special classroom projects, honors projects for AP Physics and Computer Science students, and extracurricular workshops and clubs.

Broader Impacts

Scientific programming and data science are areas of education that are frequently underserved in the high school classroom despite the long-recognized importance of these skillsets in the modern technological workforce.  Integrating such computation-oriented skills into an existing STEM curriculum without negatively affecting students' more fundamental science education, however, is a non-trivial challenge.

The Parsl-based cosmic ray data notebooks described here represent an ideal method for delivering a high-school level introduction to programming and data science in a manner that enhances, rather that distracts from, teachers' existing science-based lesson plans.  The Jupyter Notebook platform makes these notebooks accessible via web browser to any classroom, with only minimal existing knowledge required of instructors and students.  Thus, these notebooks and any future work that follows their model will be able to provide accessible instruction in important computational skills by integrating authentic research data and state-of-the-art programming tools into high school STEM education.

 


Last Modified: 11/16/2019
Modified by: Mitchell R Wayne

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page