
NSF Org: |
RISE Integrative and Collaborative Education and Research (ICER) |
Recipient: |
|
Initial Amendment Date: | September 16, 2016 |
Latest Amendment Date: | September 16, 2016 |
Award Number: | 1639655 |
Award Instrument: | Standard Grant |
Program Manager: |
Eva Zanzerkia
RISE Integrative and Collaborative Education and Research (ICER) GEO Directorate for Geosciences |
Start Date: | September 1, 2016 |
End Date: | August 31, 2019 (Estimated) |
Total Intended Award Amount: | $140,000.00 |
Total Awarded Amount to Date: | $140,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1000 OLD MAIN HL LOGAN UT US 84322-1000 (435)797-1226 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Utah Water Research Laboratory Logan UT US 84322-8200 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EarthCube |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.050 |
ABSTRACT
Scientific reproducibility -- the ability to independently verify the work of other scientists -- continues to be a critical barrier towards achieving the vision of cross-disciplinary science. Federal agencies and publishers increasingly mandate and incentivize scientists to, at a minimum, establish computational reproducibility of scientific experiments. To comply scientists must connect descriptions of scientific experiments in scholarly publications with the underlying data and code used to produce the published results and findings. However, in practice, computational reproducibility is hard to achieve since it entails isolating necessary and sufficient computational artifacts and then preserving those artifacts in a standard way for later re-execution. Both isolation and preservation present challenges in large part due to the complexity of existing software and systems as well as the implicit dependencies, resource distribution, and shifting compatibility of systems that evolve over time -- all of which conspire to break the reproducibility of an experiment. The goal of the GeoTrust project is to understand the research lifecycle of scientific experiments from conception to publication and establish a framework that will improve their reproducibility.
GeoTrust will develop sandboxing-based systems and tools that help scientists effectively isolate computational artifacts associated with an experiment, use languages and semantics to preserve artifacts, and re-execute /reproduce experiments by deploying the artifacts, changing datasets, algorithms, models, environments, etc. This reproducible framework will be adopted by and integrated within community infrastructures of three geoscience sub-disciplines viz. Hydrology, Solid Earth, and Space Science. Using cross-disciplinary science uses cases from these sub-disciplines, and engaging independent evaluators, we will assess the effectiveness of the framework in achieving reproducibility of computational experiments. Finally, verified results will be associated with ?stamps of reproducibility?, establishing community recognition of computational experiments. The framework will be developed as an EarthCube capability, with software developed and released as per EarthCube requirements. Early adopters across other geoscience sub-disciplines will be continually sought.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The establishment of knowledge in science and engineering rests on the concept of reproducibility. An important question for any study is: Are the findings reproducible? Can they be recreated independently, by other researchers, in another laboratory, experimental setting or set of observations? Findings reported by a research study need to be replicable and should be validated through independent testing before becoming generally accepted as fact or theory. There are multiple problems in the reporting of research including hydrology and water resources research that impede its reproducibility: (1) Are the data and models necessary to replicate the modeling and analyses reported in the study available; (2) are methods and procedures described sufficiently for them to be reproduced; and (3) are computer programs and the computing environment dependencies available in a way that computations can be rerun? This project has addressed these problems through enhancements to the HydroShare platform and Sciunit software.
HydroShare is a data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily and freely share products resulting from their research, not just the scientific publication summarizing a study, but also the data and models used to create the scientific publication. HydroShare accepts data from anybody, and supports Findable, Accessible, Interoperable and Reusable (FAIR) principles that are being widely promoted by a number of research organizations, funders and scientific publishers. HydroShare is comprised of two sets of functionalities: (1) a repository for users to share and publish data and models in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web-based access to compute capability. One such connected web application is a JupyterHub platform (CUAHSI JupyterHub) that uses Jupyter notebooks to record and document the computational steps involved in a research or modeling analysis.
Sciunit, is software for creating self-contained and annotated containers that describe and package computational experiments. As part of this project the Sciunit software was installed on CUAHSI JupyterHub so that it can be used to both (1) capture the analysis sequences and computational dependences of analyses being performed by a researcher, or (2) reproduce the computations encoded in a Sciunit encoded on a different platform. Reproducing hydrologic analyses served as a use case that drove improvements to the usability of Sciunit software for these purposes.
The intellectual merit resulting from this research is that it has advanced the computational knowledge and its application for the reproduction of hydrology and water resources computational analyses through the capturing and encoding of computational dependencies using Sciunit and Jupyter Notebooks. As a data and model repository HydroShare helps solve the data and model availability problem (1) mentioned above. Jupyter Notebooks stored in HydroShare and executable on JupyterHub computer platforms (that are increasingly widely available) serve to document and describe methods and procedures helping overcome impediment (2) above. However, the set of steps of a computational procedure documented in a Jupyter Notebook will only work if the libraries and platform computational dependencies used in their creation are present on the platform where they are being reproduced. Sciunit software addresses this dependency impediment (3).
This work has broad impact for hydrology and water resources analyses through its deployment on CUAHSI HydroShare which makes it widely available to the hydrology research community. With over 3500 users and holding over 8000 model and data resources the combination of Sciunit and HydroShare will bring improved reproducibility tools and best practices to a broad and diverse community of geoscientists. Beyond hydrology, the methods and tools developed as part of this project have the potential to be extended to other areas of geoscience.
Last Modified: 12/27/2019
Modified by: David G Tarboton
Please report errors in award information by writing to: awardsearch@nsf.gov.