Award Abstract # 1664162
SI2-SSI: Pegasus: Automating Compute and Data Intensive Science

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date: May 8, 2017
Latest Amendment Date: May 8, 2017
Award Number: 1664162
Award Instrument: Standard Grant
Program Manager: Varun Chandola
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 15, 2017
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $2,500,000.00
Total Awarded Amount to Date: $2,500,000.00
Funds Obligated to Date: FY 2017 = $2,500,000.00
History of Investigator:
  • Ewa Deelman (Principal Investigator)
    deelman@isi.edu
  • Miron Livny (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
(213)740-7762
Sponsor Congressional District: 34
Primary Place of Performance: University of Southern California
4676 Admiralty Way, Suite 1001
Marina del Rey
CA  US  90292-6601
Primary Place of Performance
Congressional District:
36
Unique Entity Identifier (UEI): G88KLJR3KYT5
Parent UEI:
NSF Program(s): Software Institutes
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8009, 8004
Program Element Code(s): 800400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project addresses the ever-growing gap between the capabilities offered by on-campus and off-campus cyberinfrastructures (CI) and the ability of researchers to effectively harness these capabilities to advance scientific discovery. Faculty and students on campuses struggle to extract knowledge from data that does not fit on their laptops or cannot be processed by an Excel spreadsheet and they find it difficult to efficiently manage their computations. The project sustains and enhances the Pegasus Workflow Management System, which enables scientist to orchestrate and run data- and compute-intensive computations on diverse distributed computational resources. Enhancements focus on the automation capabilities provided by Pegasus to support workflows handling large data sets, as well as usability of Pegasus that lowers the barrier of its adoption. This effort expands the reach of the advanced capabilities provided by Pegasus to researchers from a broader spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science.

For more than 15 years the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target CI. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. The power of these abstractions was demonstrated in 2015 when Pegasus was used by an international collaboration to harness a diverse set of resources and to manage compute- and data- intensive workflows that confirmed the existence of gravitational waves, as predicted by Einstein's theory of relativity. Experience from working with diverse scientific domains - astronomy, bioinformatics, climate modeling, earthquake science, gravitational and material science - uncover opportunities for further automation of scientific workflows. This project addresses these opportunities through innovation in the following areas: automation methods to include resource provisioning ahead of and during workflow execution, data-aware job scheduling algorithms, and data sharing mechanisms in high-throughput environments. To support a broader group of "long-tail" scientists, effort is devoted to usability improvements as well as outreach, education, and training activities. The proposed work includes the implementation and evaluation of advanced frameworks, algorithms, and methods that enhance the power of automation in support of data-intensive science. These enhancements are delivers as dependable software tools integrated with Pegasus so that they can be evaluated in the context of real-life applications and computing environments. The data-aware focus targets new classes of applications executing in high-throughput and high-performance environments. Pegasus has been adopted by researchers from a broad spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science. It provides and enhances access to national CI such as OSG and XSEDE, and as part of this work it will be deployed within Chameleon and Jetstream to provide broader access to NSF's CI investments. Through usability improvements, engagement with CI and community platform providers such as HubZero and Cyverse, combined with educational, training, and tutorial activities, this project broadens the set of researchers that leverage automation for their work. Collaboration with the Gateways Institute assures that Pegasus interfaces are suitable for vertical integration within science gateways and seamlessly supports new scientific communities.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 24)
Callaghan, Scott and Juve, Gideon and Vahi, Karan and Maechling, Philip J. and Jordan, Thomas H. and Deelman, Ewa "rvGAHP: push-based job submission using reverse SSH connections" Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science , 2017 10.1145/3150994.3151003 Citation Details
Casanova, Henri and Deelman, Ewa and Gesing, Sandra and Hildreth, Michael and Hudson, Stephen and Koch, William and Larson, Jeffrey and McDowell, Mary Ann and Meyers, Natalie and Navarro, John-Luke and Papadimitriou, George and Tanaka, Ryan and Taylor, Ia "Emerging Frameworks for Advancing Scientific Workflows Research, Development, and Education" 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS) , 2021 https://doi.org/10.1109/WORKS54523.2021.00015 Citation Details
Coleman, Tainã and Casanova, Henri and Pottier, Loïc and Kaushik, Manav and Deelman, Ewa and Ferreira da Silva, Rafael "WfCommons: A framework for enabling scientific workflow research and development" Future Generation Computer Systems , v.128 , 2022 https://doi.org/10.1016/j.future.2021.09.043 Citation Details
da Silva, Rafael Ferreira and Callaghan, Scott and Deelman, Ewa "On the use of burst buffers for accelerating data-intensive scientific workflows" WORKS '17 Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science , 2017 10.1145/3150994.3151000 Citation Details
da Silva, Rafael Ferreira and Casanova, Henri and Orgerie, Anne-Cécile and Tanaka, Ryan and Deelman, Ewa and Suter, Frédéric "Characterizing, Modeling, and Accurately Simulating Power and Energy Consumption of I/O-intensive Scientific Workflows" Journal of Computational Science , 2020 https://doi.org/10.1016/j.jocs.2020.101157 Citation Details
Deelman, Ewa and Ferreira da Silva, Rafael and Vahi, Karan and Rynge, Mats and Mayani, Rajiv and Tanaka, Ryan and Whitcup, Wendy and Livny, Miron "The Pegasus workflow management system: Translational computer science in practice" Journal of Computational Science , 2020 https://doi.org/10.1016/j.jocs.2020.101200 Citation Details
Deelman, Ewa and Vahi, Karan and Rynge, Mats and Mayani, Rajiv and da Silva, Rafael Ferreira and Papadimitriou, George and Livny, Miron "The Evolution of the Pegasus Workflow Management Software" Computing in Science & Engineering , v.21 , 2019 10.1109/MCSE.2019.2919690 Citation Details
Do, Tu Mai and Pottier, Loïc and Ferreira da Silva, Rafael and CaínoLores, Silvina and Taufer, Michela and Deelman, Ewa "Performance assessment of ensembles of in situ workflows under resource constraints" Concurrency and Computation: Practice and Experience , 2022 https://doi.org/10.1002/cpe.7111 Citation Details
Do, Tu Mai and Pottier, Loic and Yildiz, Orcun and Vahi, Karan and Krawczuk, Patrycja and Peterka, Tom and Deelman, Ewa "Accelerating Scientific Workflows on HPC Platforms with In Situ Processing" 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) , 2022 https://doi.org/10.1109/CCGrid54584.2022.00009 Citation Details
E. Deelman, T. Peterka "The future of scientific workflows" International journal of high performance computing applications , v.32 , 2018 Citation Details
Hamed, Ahmed Abdeen and Jonczyk, Jakub and Alam, Mohammad Zaiyan and Deelman, Ewa and Lee, Byung Suk "Mining Literature-Based Knowledge Graph for Predicting Combination Therapeutics: A COVID-19 Use Case" IEEE International Conference on Knowledge Graph (ICKG) , 2022 https://doi.org/10.1109/ICKG55886.2022.00018 Citation Details
(Showing: 1 - 10 of 24)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project funded the development and user support for the Pegasus workflow management system. Pegasus automates the execution of computational workflows on heterogeneous cyberinfrastructure including campus resources, clouds, as well as high-performance and high-throughput computing systems. Pegasus pioneered the use of planning in scientific workflow systems. It enables users to focus on their science by describing their workflows in a resource-independent way. Pegasus takes that description and automatically maps the tasks onto heterogeneous resources, determines the necessary data transfers between tasks, and optimizes the workflow for performance and reliability. The result is an executable workflow that includes compute job submit scripts and data management jobs for the target cyberinfrastructure. Pegasus has a notion of the submit host from where the system submits jobs to multiple distributed resources within the national cyberinfrastructure ecosystem.  Pegasus workflows are easy to compose using Python APIs and Jupyter Notebook interfaces and are portable across heterogeneous cyberinfrastructure.

As part of this project, Pegasus enabled a wide variety of researchers to make scientific breakthroughs. Researchers from the Southern California Earthquake Center have used Pegasus to generate state-of-the-art physics-based seismic hazard maps of California. These maps can inform how the next generation of civil infrastructure needs to be designed and built. They can help insurance companies assess earthquake risk and can enable disaster planners to adequately prepare for significant earthquakes. Pegasus provided community resources to create better soybeans and enabled policymakers to make decisions about land and water usage. Pegasus powers CASA weather radar workflows, providing time-critical information during severe weather events in the Dallas Fort Worth area.  The Event Horizon Telescope project uses it to simulate Black Holes, and other astronomers use it to generate mosaics of the galactic plane. Pegasus was also adapted to enable real time experimental data analysis. For example, Pegasus does real-time 3D reconstruction of biological samples as scientists run their experiments at USC’s Cryo-EM facility, greatly improving result quality.

This project supported these and other applications by enhancing Pegasus' capabilities, enabling it to support novel cyberinfrastructure, and providing hands-on support to scientists.

 

 


Last Modified: 01/21/2024
Modified by: Ewa Deelman

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page