Award Abstract # 1541450
CC*DNI DIBBS: Merging Science and Cyberinfrastructure Pathways: The Whole Tale

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF ILLINOIS
Initial Amendment Date: March 3, 2016
Latest Amendment Date: January 23, 2022
Award Number: 1541450
Award Instrument: Cooperative Agreement
Program Manager: Alejandro Suarez
alsuarez@nsf.gov
 (703)292-7092
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: March 1, 2016
End Date: February 28, 2023 (Estimated)
Total Intended Award Amount: $4,986,951.00
Total Awarded Amount to Date: $5,887,240.00
Funds Obligated to Date: FY 2016 = $4,986,951.00
FY 2019 = $293,559.00

FY 2020 = $304,774.00

FY 2021 = $301,956.00
History of Investigator:
  • Bertram Ludaescher (Principal Investigator)
    ludaesch@illinois.edu
  • Victoria Stodden (Co-Principal Investigator)
  • Niall Gaffney (Co-Principal Investigator)
  • Matthew Turk (Co-Principal Investigator)
  • Kyle Chard (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA
IL  US  61801-3620
(217)333-2187
Sponsor Congressional District: 13
Primary Place of Performance: University of Illinois at Urbana-Champaign
506 S. Wright Street
Urbana
IL  US  61801-3620
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): Y8CWNJRCNN91
Parent UEI: V2PHZ2CSCH63
NSF Program(s): Data Cyberinfrastructure
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8048, 8084
Program Element Code(s): 772600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible from scholarly publications. The third layer is broad, encompassing numerous research communities through science pathways (e.g., in astronomy, life and earth sciences, materials science, social science), and deep, using interconnected cyberinfrastructure pathways and shared technologies.

The goal of this project is to strengthen the second layer of research output, and to build a robust third layer that integrates all parts of the story, conveying the holistic experience of reproducible scientific inquiry by (1) exposing existing cyberinfrastructure through popular frontends, e.g., digital notebooks (IPython, Jupyter), traditional scripting environments, and workflow systems; (2) developing the necessary 'software glue' for seamless access to different backend capabilities, including from DataNet federations and Data Infrastructure Building Blocks (DIBBs) projects; and (3) enhancing the complete data-to-publication lifecycle by empowering scientists to create computational narratives in their usual programming environments, enhanced with new capabilities from the underlying cyberinfrastructure (e.g., identity management, advanced data access and provenance APIs, and Digital Object Identifier-based data publications). The technologies and interfaces will be developed and stress-tested using a diverse set of data types, technical frameworks, and early adopters across a range of science domains.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adam Brinckman, Kyle Chard, Niall Gaffney, Mihael Hategan, MatthewB. Jones, Kacper Kowalik, Sivakumar Kulasekaran, Bertram Ludäscher,Bryce D. Mecum, Jarek Nabrzyski, Victoria Stodden, Ian J. Taylor,Matthew J. Turk, Kandace Turner "Computing environments for reproducibility: Capturing the ??Whole Tale??" Future Generation Computer Systems , 2017 https://doi.org/10.1016/j.future.2017.12.029
Adam Brinckman, Kyle Chard, Niall Gaffney, Mihael Hategan, MatthewB. Jones, Kacper Kowalik, Sivakumar Kulasekaran, Bertram Ludäscher,Bryce D. Mecum, Jarek Nabrzyski, Victoria Stodden, Ian J. Taylor,Matthew J. Turk, Kandace Turner "Computing environments for reproducibility: Capturing the ??Whole Tale??" Future Generation Computer Systems , 2018 https://doi.org/10.1016/j.future.2017.12.029
Adam Brinckman, Kyle Chard, Niall Gaffney, Mihael Hategan, Matthew B. Jones, Kacper Kowalik, Sivakumar Kulasekaran, Bertram Ludäscher, Bryce Mecum, Jaroslaw Nabrzyski, Victoria Stodden, Ian Taylor, Matthew Turk, and Kandace Turner "The Whole Tale: Merging Science and Cyberinfrastructure Pathways" Poster presented at Globus World/NDS Workshop. , 2017
K Chard, N Gaffney, M Jones, K Kowalik, B Ludaescher, T McPhillips, J Nabrzyski, V Stodden, I Taylor, T Thelen, M Turk, C Willis "Application of BagIt-Serialized Research Object Bundles for Packaging and Re-execution of Computational Analyses" Workshop on Research Objects 2019 (RO2019) , 2019 https://doi.org/10.5281/zenodo.3381754
K. Chard, N. Gaffney, M. Jones, K. Kowalik, B. Ludäscher, J. Nabrzyski, V. Stodden, I. Taylor, M. Turk, and C. Willis "Implementing Computational Reproducibility in the Whole Tale Environment" P-RECS '19: Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems , 2019 https://doi.org/10.1145/3322790.3330594
Kyle Chard, Niall Gaffney, Mihael Hategan, Kacper Kowalik, Bertram Ludaescher, Timothy McPhillips, Jarek Nabrzyski, Victoria Stodden, Ian Taylor, Thomas Thelen, Matthew Turk, Craig Willis "Toward Enabling Reproducibility for Data-Intensive Research using the Whole Tale Platform" ParCo 2019 : Parallel Computing Conference , 2019
McPhillips, Timothy M; Willis, Craig; Gryk, Michael R.; Nunez-Corrales, Santiago; Ludäscher, Bertram "Reproducibility by Other Means: Transparent Research Objects" Workshop on Research Objects 2019 (RO2019) , 2019 https://doi.org/10.5281/zenodo.3382423
McPhillips TM, Thelen T, Willis C, Kowalik K, Jones MB, Ludäscher B "CPR-A Comprehensible Provenance Record for Verification Workflows in Whole Tale" 8th and 9th International Provenance and Annotation Workshop, IPAW 2020 + IPAW 2021, LNCS , v.12839 , 2021 , p.263 10.1007/978-3-030-80960-7_23

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Communities across the sciences are increasingly concerned about the transparency and reproducibility of results obtained by computational means. Through new policies and community norms, researchers are now regularly required to provide the underlying data and software used to produce published results and findings. The Whole Tale project is addressing this challenge through the creation of an open-source cloud-based platform designed to simplify the creation, publication, and verification of transparent and reproducible computational research. 

Used by thousands of researchers, students, and educators, Whole Tale implements best practices in support of computational transparency and reproducibility. Through integration with popular research repositories (e.g., DataONE, Dataverse, Zenodo, and OpenICSPR), support for widely-used analysis environments (e.g., JupyterLab, RStudio, MATLAB, and Stata), and containerization techniques, researchers can easily connect the data, code, and computational environment used to obtain findings to scientific publications. Standards-based packages, called tales, are published to supported archival repositories enabling exploration, verification, and re-use. The recorded run feature enables capture of provenance information, further increasing trust in findings by demonstrating that the provided data and software were actually used to obtain reported results.

The Whole Tale team currently operates an open-access service using the NSF Jetstream2 cloud (https://dashboard.wholetale.org). The system can also be deployed on anything from a single server to a scalable multi-node cluster to meet the needs of specific research teams or communities.  Documentation, training, and outreach materials are publicly available.

The Whole Tale project is developing infrastructure designed to address key challenges related to trust in the integrity of published research. Although the initial grant is completed, development of the platform is ongoing. As an open-source project, community contributions are always welcome.

 

 

 


Last Modified: 06/29/2023
Modified by: Bertram Ludaescher

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page