
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 22, 2016 |
Latest Amendment Date: | August 22, 2016 |
Award Number: | 1642053 |
Award Instrument: | Standard Grant |
Program Manager: |
Rob Beverly
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $290,000.00 |
Total Awarded Amount to Date: | $290,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4676 Admiralty Way, Suite 1001 Marina del Rey CA US 90292-6611 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Cybersecurity Innovation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Scientists use computer systems to analyze and store their scientific data, sometimes in a complex process across multiple machines. This process can be tedious and error-prone, which has led to the development of software known as a "workflow management system". Workflow management systems allow scientists to describe their process in a human-friendly way and then the software handles the details of the processing for the scientists, dealing with tedious and repetitive steps and handling errors. One popular workflow management system is Pegasus, which, over the past three years, was used to run over 700,000 workflows by scientists in a number of domains including astronomy, bioinformatics, earthquake science, gravitational wave physics, ocean science, and neuroscience. The "Scientific Workflow Integrity with Pegasus" project enhances Pegasus with additional security features. The scientist's description of their desired work is protected from tampering and the data processed by Pegasus is checked to ensure it hasn't been accidentally or maliciously modified. Such tamper protection is attained by cryptographic techniques that ensure data integrity. These changes allow scientists, and our society, to be more confident of scientific findings based on collected data.
The Scientific Workflow Integrity with Pegasus project strengthens cybersecurity controls in the Pegasus Workflow Management System in order to provide assurances with respect to the integrity of computational scientific methods. These strengthened controls enhance both Pegasus' handling of science data and its orchestration of software-defined networks and infrastructure. The result is increased trust in computational science and increased assurance in our ability to reproduce the science by allowing scientists to validate that data has not been changed since a workflow completed and that the results from multiple workflows are consistent. The focus on Pegasus is due to its popularity in the scientific community as a method of computation and data management automation. For example, LIGO, the NSF-funded gravitational-wave physics project, recently used the Pegasus Workflow Management System to structure and execute the analyses that confirmed and quantified its historic detection of a gravitational wave, confirming the prediction made by Einstein 100 years ago. The proposed project has established collaborations with LIGO and additional key NSF infrastructure providers and science projects to ensure broadly applied results.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Scientists use computer systems to analyze and store their scientific data, sometimes in a complex process across multiple machines in different geographical locations. It has been observed that sometimes during this complex process, scientific data is unintentionally modified or accidentally tampered with, with errors going undetected and corrupt data becoming part of the scientific record. When such errors occur, scientific computations can fail and result in increased computational cost due to reruns, and worse, results can be corrupted in a manner not apparent to the scientist and produce invalid science results. Computer systems technologies such as TCP checksums, encrypted transfers, checksum validation, RAID and erasure coding provide data correctness assurances at different levels, but they may not work for large data sizes and may not cover a workflow process from end-to-end, leaving gaps in which data corruption can occur undetected.
The Scientific Workflow Integrity with Pegasus (SWIP) project tackled the problem of detecting these data errors that could occur during the scientific processing workflow, and provided methods to report these errors to the scientists. The solutions were integrated into Pegasus, a popular workflow management system used to describe complex scientific processes in a user-friendly way and that handles the details of processing for the scientists. Pegasus is used to manage scientific computations in the whole range from small research teams to large scientific collaborations, such as the Laser Interferometer Gravitational-wave Observatory (LIGO). To validate our approach, we developed a software called “Chaos Jungle” that can be used to intentionally inject errors into computer systems, either in the network or storage systems. This software allowed us to test our solutions by simulating corrupt computer systems and by determining whether we are able to detect data integrity errors under those conditions.
Our research methods and solutions were also deployed and validated on national computing resources, e.g. the Open Science Grid (OSG), with exemplar scientific applications from gravitational-wave physics, earthquake science, and bioinformatics. A subset of active Pegasus users have opted in to share detailed workflow provenance data with the Pegasus development team. From that data, it was determined that Pegasus detected and protected users from integrity errors in 299 data transfer instances so far. The solutions developed in the SWIP project allowed scientists, and our society, to be more confident of scientific findings based on collected data. Our PEARC19 publication summarizing our work was awarded the Phil Andrews Most Transformative Contribution Award as well as the award for Best Paper in "Advanced Research Computing Software and Applications" Track. We also applied our work to the PhysiCell open source cell simulation framework, used in modeling the SARS-CoV-2 virus which causes COVID-19, to improve its data integrity and reproducibility.
Last Modified: 09/02/2020
Modified by: Ewa Deelman
Please report errors in award information by writing to: awardsearch@nsf.gov.