
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | May 8, 2017 |
Latest Amendment Date: | May 8, 2017 |
Award Number: | 1664162 |
Award Instrument: | Standard Grant |
Program Manager: |
Varun Chandola
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 15, 2017 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $2,500,000.00 |
Total Awarded Amount to Date: | $2,500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4676 Admiralty Way, Suite 1001 Marina del Rey CA US 90292-6601 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software Institutes |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This project addresses the ever-growing gap between the capabilities offered by on-campus and off-campus cyberinfrastructures (CI) and the ability of researchers to effectively harness these capabilities to advance scientific discovery. Faculty and students on campuses struggle to extract knowledge from data that does not fit on their laptops or cannot be processed by an Excel spreadsheet and they find it difficult to efficiently manage their computations. The project sustains and enhances the Pegasus Workflow Management System, which enables scientist to orchestrate and run data- and compute-intensive computations on diverse distributed computational resources. Enhancements focus on the automation capabilities provided by Pegasus to support workflows handling large data sets, as well as usability of Pegasus that lowers the barrier of its adoption. This effort expands the reach of the advanced capabilities provided by Pegasus to researchers from a broader spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science.
For more than 15 years the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target CI. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. The power of these abstractions was demonstrated in 2015 when Pegasus was used by an international collaboration to harness a diverse set of resources and to manage compute- and data- intensive workflows that confirmed the existence of gravitational waves, as predicted by Einstein's theory of relativity. Experience from working with diverse scientific domains - astronomy, bioinformatics, climate modeling, earthquake science, gravitational and material science - uncover opportunities for further automation of scientific workflows. This project addresses these opportunities through innovation in the following areas: automation methods to include resource provisioning ahead of and during workflow execution, data-aware job scheduling algorithms, and data sharing mechanisms in high-throughput environments. To support a broader group of "long-tail" scientists, effort is devoted to usability improvements as well as outreach, education, and training activities. The proposed work includes the implementation and evaluation of advanced frameworks, algorithms, and methods that enhance the power of automation in support of data-intensive science. These enhancements are delivers as dependable software tools integrated with Pegasus so that they can be evaluated in the context of real-life applications and computing environments. The data-aware focus targets new classes of applications executing in high-throughput and high-performance environments. Pegasus has been adopted by researchers from a broad spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science. It provides and enhances access to national CI such as OSG and XSEDE, and as part of this work it will be deployed within Chameleon and Jetstream to provide broader access to NSF's CI investments. Through usability improvements, engagement with CI and community platform providers such as HubZero and Cyverse, combined with educational, training, and tutorial activities, this project broadens the set of researchers that leverage automation for their work. Collaboration with the Gateways Institute assures that Pegasus interfaces are suitable for vertical integration within science gateways and seamlessly supports new scientific communities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project funded the development and user support for the Pegasus workflow management system. Pegasus automates the execution of computational workflows on heterogeneous cyberinfrastructure including campus resources, clouds, as well as high-performance and high-throughput computing systems. Pegasus pioneered the use of planning in scientific workflow systems. It enables users to focus on their science by describing their workflows in a resource-independent way. Pegasus takes that description and automatically maps the tasks onto heterogeneous resources, determines the necessary data transfers between tasks, and optimizes the workflow for performance and reliability. The result is an executable workflow that includes compute job submit scripts and data management jobs for the target cyberinfrastructure. Pegasus has a notion of the submit host from where the system submits jobs to multiple distributed resources within the national cyberinfrastructure ecosystem. Pegasus workflows are easy to compose using Python APIs and Jupyter Notebook interfaces and are portable across heterogeneous cyberinfrastructure.
As part of this project, Pegasus enabled a wide variety of researchers to make scientific breakthroughs. Researchers from the Southern California Earthquake Center have used Pegasus to generate state-of-the-art physics-based seismic hazard maps of California. These maps can inform how the next generation of civil infrastructure needs to be designed and built. They can help insurance companies assess earthquake risk and can enable disaster planners to adequately prepare for significant earthquakes. Pegasus provided community resources to create better soybeans and enabled policymakers to make decisions about land and water usage. Pegasus powers CASA weather radar workflows, providing time-critical information during severe weather events in the Dallas Fort Worth area. The Event Horizon Telescope project uses it to simulate Black Holes, and other astronomers use it to generate mosaics of the galactic plane. Pegasus was also adapted to enable real time experimental data analysis. For example, Pegasus does real-time 3D reconstruction of biological samples as scientists run their experiments at USC’s Cryo-EM facility, greatly improving result quality.
This project supported these and other applications by enhancing Pegasus' capabilities, enabling it to support novel cyberinfrastructure, and providing hands-on support to scientists.
Last Modified: 01/21/2024
Modified by: Ewa Deelman
Please report errors in award information by writing to: awardsearch@nsf.gov.