Award Abstract # 1639759
EarthCube Building Blocks: Collaborative Proposal: GeoTrust: Improving Sharing and Reproducibility of Geoscience Applications

NSF Org: RISE
Integrative and Collaborative Education and Research (ICER)
Recipient: DEPAUL UNIVERSITY
Initial Amendment Date: September 16, 2016
Latest Amendment Date: July 9, 2020
Award Number: 1639759
Award Instrument: Standard Grant
Program Manager: Eva Zanzerkia
RISE
 Integrative and Collaborative Education and Research (ICER)
GEO
 Directorate for Geosciences
Start Date: September 1, 2016
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $783,999.00
Total Awarded Amount to Date: $783,999.00
Funds Obligated to Date: FY 2016 = $783,999.00
History of Investigator:
  • Tanu Malik (Principal Investigator)
    tanu.malik@depaul.edu
  • Ian Foster (Co-Principal Investigator)
Recipient Sponsored Research Office: DePaul University
1 E JACKSON BLVD
CHICAGO
IL  US  60604-2287
(312)362-7388
Sponsor Congressional District: 07
Primary Place of Performance: DePaul University
243 South Wabash Avenue
Chicago
IL  US  60604-2287
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): MNZ8KMRWTDB6
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

Scientific reproducibility -- the ability to independently verify the work of other scientists -- continues to be a critical barrier towards achieving the vision of cross-disciplinary science. Federal agencies and publishers increasingly mandate and incentivize scientists to, at a minimum, establish computational reproducibility of scientific experiments. To comply scientists must connect descriptions of scientific experiments in scholarly publications with the underlying data and code used to produce the published results and findings. However, in practice, computational reproducibility is hard to achieve since it entails isolating necessary and sufficient computational artifacts and then preserving those artifacts in a standard way for later re-execution. Both isolation and preservation present challenges in large part due to the complexity of existing software and systems as well as the implicit dependencies, resource distribution, and shifting compatibility of systems that evolve over time -- all of which conspire to break the reproducibility of an experiment. The goal of the GeoTrust project is to understand the research lifecycle of scientific experiments from conception to publication and establish a framework that will improve their reproducibility.

GeoTrust will develop sandboxing-based systems and tools that help scientists effectively isolate computational artifacts associated with an experiment, use languages and semantics to preserve artifacts, and re-execute /reproduce experiments by deploying the artifacts, changing datasets, algorithms, models, environments, etc. This reproducible framework will be adopted by and integrated within community infrastructures of three geoscience sub-disciplines viz. Hydrology, Solid Earth, and Space Science. Using cross-disciplinary science uses cases from these sub-disciplines, and engaging independent evaluators, we will assess the effectiveness of the framework in achieving reproducibility of computational experiments. Finally, verified results will be associated with ?stamps of reproducibility?, establishing community recognition of computational experiments. The framework will be developed as an EarthCube capability, with software developed and released as per EarthCube requirements. Early adopters across other geoscience sub-disciplines will be continually sought.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pham, Quan and Malik, Tanu and That, Dai Hai and Youngdahl, Andrew "Improving Reproducibility of Distributed Computational Experiments" 1st ACM HPDC Workshop on Practical Reproducible Evaluation of Computer Systems , 2018 https://doi.org/10.1145/3214239.3214241 Citation Details
Y. Nakamura, T. Malik "Efficient Provenance Alignment in Reproduced Executions" USENIX Theory and Practice of Provenance , 2020 Citation Details
Yuan, Zhihao and Ton That, Dai and Kothari, Siddhant and Fils, Gabriel and Malik, Tanu "Utilizing Provenance in Reusable Research Objects" Informatics , v.5 , 2018 https://doi.org/10.3390/informatics5010014 Citation Details
Essawy, Bakinam T. and Goodall, Jonathan L. and Zell, Wesley and Voce, Daniel and Morsy, Mohamed M. and Sadler, Jeffrey and Yuan, Zhihao and Malik, Tanu "Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust" Environmental Modelling & Software , v.105 , 2018 https://doi.org/10.1016/j.envsoft.2018.03.025 Citation Details
J. Chuah, M.Deeds "Documenting Computing Environments for Reproducible Experiments" Parallel Computing: Technology Trends , 2020 https://doi.org/10.3233/APC200106 Citation Details
Naga Nithin Manne, Shilvi Satpati "CHEX: Multiversion Replay with Ordered Checkpoints." Proceedings of the Very Large Databases , v.15 , 2022 https://doi.org/10.14778/3514061.3514075 Citation Details
Ton That, Dai Hai and Fils, Gabriel and Yuan, Zhihao and Malik, Tanu "Sciunits: Reusable Research Objects" IEEE 13th International Conference on e-Science (e-Science) , 2017 https://doi.org/10.1109/eScience.2017.51 Citation Details
A. Youngdahl, D.H. Ton "SciInc: A Container Runtime for Incremental Recomputation" IEEE 15th International Conference on eScience , 2019 https://doi.org/10.1109/eScience.2019.00040 Citation Details
Dai Hai Ton That, Andrew Youngdahl "Using provenance for generating automatic citations" Proceedings of the 10th USENIX Conference on Theory and Practice of Provenance , 2018 https://doi.org/10.5555/3319379.3319381 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The key outcomes of this project are as follows:

1. Established Sciunit, a novel containerization mechanism for establishing computational reproducibility. Sciunit combines automatic encapsulation with provenance and namespace isolation to achieve different levels of computational reproducibility. Production-ready Sciunit is available via http://sciunit.run

2. Demonstrated the effectiveness of this mechanism in a wide range of geoscience disciplines: solid earth, space science, and hydrology. Lead to 4 journal papers (3 accepted and 1 pending) in geoscience and computational domains, 6 conference papers, and 10 AGU and EGU presentations, along with numerous presentations and EarthCube All Hands Meetings and NSF-funded workshops. 

3. Increased awareness and investigation of the issues affecting computational reproducibility. The PI's work lead to additional funding from the EarthCube program for deeper integration of Sciunit within community cyberinfrastructure (CI), and from the prestigious NSF CAREER program for developing formal and systematic approaches for guaranteeing reproducibility in complex computational and data science applications. 

4. The project provided the funds to support and train 1 postdoc (minority), 4 research engineers (3 minority), and 8 Master's-level graduate students over a period of five years. 

5. All primary and secondary repositories created for this project are open-source and accessible via https://bitbucket.org/geotrust/. The primary project is available at: https://bitbucket.org/geotrust/sciunit2/src/master/

6. The PI served on the EarthCube Nominations Committee for 2 years and participated in several EarthCube governance meetings, and every year at the EarthCube All Hands meeting. 

 


Last Modified: 12/30/2021
Modified by: Tanu Malik

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page