Award Abstract # 1550588
Collaborative Research: SI2-SSI: Swift/E: Integrating Parallel Scripted Workflow into the Scientific Software Ecosystem

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF CHICAGO
Initial Amendment Date: September 13, 2016
Latest Amendment Date: September 1, 2021
Award Number: 1550588
Award Instrument: Standard Grant
Program Manager: Varun Chandola
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2016
End Date: September 30, 2022 (Estimated)
Total Intended Award Amount: $2,749,446.00
Total Awarded Amount to Date: $2,770,286.00
Funds Obligated to Date: FY 2016 = $2,749,446.00
FY 2019 = $20,840.00
History of Investigator:
  • Kyle Chard (Principal Investigator)
    chard@uchicago.edu
  • Michael Wilde (Co-Principal Investigator)
  • Daniel Katz (Co-Principal Investigator)
  • Dmitry Karpeev (Co-Principal Investigator)
  • Ravi Madduri (Co-Principal Investigator)
  • Justin Wozniak (Co-Principal Investigator)
  • Michael Wilde (Former Principal Investigator)
Recipient Sponsored Research Office: University of Chicago
5801 S ELLIS AVE
CHICAGO
IL  US  60637-5418
(773)702-8669
Sponsor Congressional District: 01
Primary Place of Performance: University of Chicago
5735 S. Ellis Av.
Chicago
IL  US  60637-1403
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): ZUE9HKT2CLC9
Parent UEI: ZUE9HKT2CLC9
NSF Program(s): Information Technology Researc,
Software Institutes
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8004, 8009, 9102, CL10
Program Element Code(s): 164000, 800400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Science and engineering research increasingly relies on repeated execution of a complex series of steps (i.e., workflows) to form hypotheses; conduct experiments; analyze results; and refine theory. Computation is often essential throughout the workflow and in this case, software can improve productivity by managing the computational and data workflow. Swift is one such open-source workflow system that has been developed and widely used in diverse areas ranging from materials simulations and climate modeling to neuroscience and genomics. This project extends the capabilities of Swift by integrating it with other software systems that enable collaboration, usability, maintainability, and productivity. The new ecosystem, Swift/E, will enable scientists and engineers to more productively create and run computational workflow campaigns of larger scale, and debug, execute, adapt, and disseminate them faster and easier than has been possible to date. These workflows embody and communicate the computational methods specific to each domain of scientific inquiry. Swift/E achieves community engagement and extensive productivity benefits for a large user community through an integrated program of research, education, and software dissemination. The project engages and serves science and engineering communities by creating patterns of practice for building and sharing reusable workflow libraries, and by training students, educators, and researchers in their use. To advance the education of the next generation of computationally trained scientists, Swift/E powers a network of NSF-supported "e-Labs" that teach the concepts of collaborative parallel computational science at high school and undergraduate levels, reaching over a thousand students annually.

The open-source Swift/E "ecosystem" integrates Swift with several scientific software elements that play a major role in the national and global cyberinfrastructure of today. These elements are: Swift for the parallel scripting of scientific workflow; Globus for data cataloging, management, and high-speed wide-area transport; the Web-based Galaxy workflow portal for workflow composition, execution, and collaborative sharing; Jupyter for the interactive development, testing, debugging, and assembly of high level programming and workflow languages; Python and R for productively expressing high-level computational logic; and "git" and related tools and Web portals for revision control, code dissemination and sharing, and for the collaborative engagement of developers. Swift's implicitly parallel programming language is minimal and compact. Swift provides a facility for embedding other scripting languages (currently Python, R, Julia and Tcl) into its runtime environment. This project merges newer extreme-scale "Swift/T" capabilities with the flexible and portable original "Swift/K" version to make the core Swift/E software element more powerful and flexible while lowering it?s ongoing support cost. Swift/E enhances usability by extending Swift's troubleshooting and inter-language integration facilities. And with enhanced and innovative workflow sharing archives, new training materials, and a sustained program for user support and self-sustaining and expanding community engagement, the Swift/E project engages, supports, and sustains a large global science and engineering user base.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 27)
Ahmed AE, Heldenbrand J, Asmann Y, Fadlelmola FM, Katz DS, Kendig K, et a "Managing genomic variant calling workflows with Swift/T" PLoS ONE , v.14 , 2019
AS Villarreal, Y Babuji, T Uram, DS Katz, K Chard, K Heitmann "Extreme Scale Survey Simulation with Python Workflows" eScience , 2021
Baughman, Matt and Caton, Simon and Haas, Christian and Chard, Ryan and Wolski, Rich and Foster, Ian and Chard, Kyle "Deconstructing the 2017 Changes to AWS Spot Market Pricing" 10th Workshop on Scientific Cloud Computing , 2019 10.1145/3322795.3331465 Citation Details
Baughman, Matt and Chard, Ryan and Ward, Logan and Pitt, Jason and Chard, Kyle and Foster, Ian "Profiling and Predicting Application Performance on the Cloud" 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC) , 2018 10.1109/UCC.2018.00011 Citation Details
Chard, Ryan and Babuji, Yadu and Li, Zhuozhao and Skluzacek, Tyler and Woodard, Anna and Blaiszik, Ben and Foster, Ian and Chard, Kyle "funcX: A Federated Function Serving Fabric for Science" Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing , 2020 https://doi.org/10.1145/3369583.3392683 Citation Details
K Chard, Y Babuji, A Woodard, B Clifford, Z Li, M Hategan, I Foster, D. Katz, K. Chard "Productive Parallel Programming with Parsl" ACM SIGAda Ada Letters , v.40 , 2020
Krogstad, M. J., Gehring, P. M., Rosenkranz, S., Osborn, R., Ye, F., Liu, Y., Ruff, J. P. C., Chen, W., Wozniak, J. M., Luo, H., Chmaissem, O., Ye, Z., and Phelan "The relation of local order to material properties in relaxor ferroelectrics" Nature Materials , v.17 , 2018
Krogstad, M. J., Gehring, P. M., Rosenkranz, S., Osborn, R., Ye, F., Liu, Y., Ruff, J. P. C., Chen, W., Wozniak, J. M., Luo, H., Chmaissem, O., Ye, Z., and Phelan, D. "The relation of local order to material properties in relaxor ferroelectrics" Nature Materials , v.17 , 2018
Kyle Chard, Yadu Babuji, Anna Woodard, Ben Clifford, Zhuozhao Li, Mihael Hategan, Ian Foster, Mike Wilde, Daniel S Katz "Productive Parallel Programming with Parsl" ACM SIGAda Ada Letters , 2021 https://doi.org/10.1145/3463478.3463486
Li, Zhuozhao and Chard, Ryan and Ward, Logan and Chard, Kyle and Skluzacek, Tyler J. and Babuji, Yadu and Woodard, Anna and Tuecke, Steven and Blaiszik, Ben and Franklin, Michael J. and Foster, Ian "DLHub: Simplifying publication, discovery, and use of machine learning models in science" Journal of Parallel and Distributed Computing , v.147 , 2021 https://doi.org/10.1016/j.jpdc.2020.08.006 Citation Details
Ozik, J., Collier, N. T., Wozniak, J. M., Macal, C., and An, G "Extreme-scale dynamic exploration of a distributed agent-based model with the EMEWS framework" IEEE Transactions on Computational Social Systems , v.5 , 2018
(Showing: 1 - 10 of 27)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has supported the development of Parsl—an open source parallel programming library that is designed to address the distributed and parallel computing challenges inherent in modern research, such as the need to assemble programs from various components, to execute workflows across heterogenous computers, and to scale workloads to utilize large high-performance computing clusters. Parsl allows for 1) straightforward composition of programs in Python and 2) scalable parallel and distributed execution on heterogeneous cyberinfrastructure. The Parsl model is simple: users apply Python decorators to standard Python functions that wrap Python code or external scripts, applications, and binaries. These decorators express opportunities for parallelism in standard Python programs. Parsl implements an extensible execution model, via which programs can be executed on local or remote resources via two abstractions: executors and providers. Executors manage the way tasks are executed on computers, for example via pilot jobs or MPI jobs. Parsl supports three internal executors and four external community-contributed executors. Providers abstract the process of provisioning resources, for example via batch schedulers or cloud APIs.

 

The open source Parsl project, started in 2016, now has 73 global contributors, far exceeding the size of the funded development team. Parsl, which released production version 1.0 in 2020 and is now releasing new versions weekly, has been downloaded more than 1.6 million times and has been used extensively across national cyberinfrastructure, campus research computing centers, clouds, and even at the edge. Parsl is used by individual researchers, large research consortia, and in industry. Users span domains such as astrophysics, biology, materials science, and many others. In addition to end users, Parsl is also built into platforms such as QC Archive in quantum chemistry, and is a user of tools contributed by others such as WorkQueue and RADICAL-Pilot. Parsl has had significant impact in broad science domains, for example being used to produce the most interconnected simulated sky survey in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), to conduct one of the largest single batch imputations ever performed on 474k subjects in the Million Veterans Program, and to search for potential COVID-19 therapeutics in a search space of billions of candidate molecules. 

 

The Parsl project has reached a broad community of users and contributors via the annual ParslFest meeting, with dozens of presentations outlining novel use of Parsl in myriad research domains. The project has engaged more than two dozen high school, undergraduate, and graduate students, with many contributing to the open source code and using Parsl in various application domains. These students have been equipped with crucial parallel and distributed computing skills for their future careers.

 


Last Modified: 02/08/2023
Modified by: Kyle Chard

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page