
NSF Org: |
RISE Integrative and Collaborative Education and Research (ICER) |
Recipient: |
|
Initial Amendment Date: | September 23, 2019 |
Latest Amendment Date: | September 23, 2019 |
Award Number: | 1928366 |
Award Instrument: | Standard Grant |
Program Manager: |
Sean Kennan
skennan@nsf.gov (703)292-7575 RISE Integrative and Collaborative Education and Research (ICER) GEO Directorate for Geosciences |
Start Date: | September 15, 2019 |
End Date: | August 31, 2023 (Estimated) |
Total Intended Award Amount: | $616,613.00 |
Total Awarded Amount to Date: | $616,613.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
21 N PARK ST STE 6301 MADISON WI US 53715-1218 (608)262-3822 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
21 N Park St Madison WI US 53715-1218 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EarthCube |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.050 |
ABSTRACT
The long term sustainability of federally funded research depends on the discovery, accessibility and reuse of data. However, data and research products are often stored in different locations. This makes it challenging to find and integrate related data. This project helps support the discovery of related but distributed research products for Earth science and natural history data. Researches will have a way to link data resources, add context, or provide additional information about data, software and publications. Researchers use this system to create annotations that link resources using unique identifiers. Over time, these links connect to create a network of data resources. This project will support the development of the underlying database, a user interface for access and discovery, and a number of documentation tools and workshops to help support the ongoing development and sustainability of the Throughput database.The broader impacts of this project include the engagement of early career researchers and the better sharing of data and other research products in the Earth sciences.
Improved discoverability of data, metadata and services is a need shared across the geosciences. A barrier that increases "time-to-science" in Earth Science research is the difficulty of integrating individual observations, concepts, data models, and statistical techniques across subfields. The Throughput Annotation Engine (TAE) offers a solution to the challenge of managing interdisciplinary workflows by providing multiple points of entry to access, annotate and interact with data, and to link code, data, publications, or other elements to one another. The TAE will support adoption as part of a system of systems, linking outside users to data repositories, and providing data stewards a degree of flexibility in deciding how to manage new information: whether to incorporate information in annotations into their data models, or to access and report annotations using the Throughput API. Throughput improves credit for data, software and documentation by providing a mechanism to track use and implementation across a range of publications and online resources, and will provide a new set of citation tools based on FORCE11 recommendations. Linked annotations in the TAE support the generation of metrics for research infrastructure beyond standard publication metrics. The Cookbook will allow individuals and communities to identify programmatic workflows, and link them to community data resources. The Cookbook will provide researchers with access to community best-practices, and, by leveraging ORCID user credentials, to individuals engaged in the development of documentation and workflows. Throughput offers a user-centered solution to deepening and densifying connections among the many nodes in the emerging linked data ecosystem of scientists, data, software services, journals, and funders.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Throughput Project was able to catalyze a number of groups to work together to address issues in legacy data management, data stewardship and develop tools and concepts to manage linked data across Earth Science data resources. The current data resource landscape is highly fragmented, although advances in metadata standards mean that we can now more clearly identify linked resources (physical samples, derived chemical measurements, publications, educational materials), however technical implementation lags.
Throughput was able to show a broad landscape of research projects on shared code platforms such as GitHub, and the links these resources have to active research data repositories. Throughput was able to support pushes for well-qualified metadata in the tephra data community, and was an early partner and participant in EarthCube’s ScienceOnSchema project which has led to published metadata standards that have been adopted and implemented by associated projects (e.g., the Neotoma Paleoecology Database).
Products
Throughput has produced a published article (Thomer et al., 2023) that explores different standards of practice within the development and deployment of code repositories. The bulk of the work by Throughput was in software development, including EarthCube online workshops (https://doi.org/10.5281/zenodo.6423500), a documented RESTful API (https://doi.org/10.5281/zenodo.7908786), a tool to extract location metadata from published papers ( https://doi.org/10.5281/zenodo.10611652) and a tool to recommend journal articles for inclusion within data repositories (https://doi.org/10.5281/zenodo.10611693).
Documentation and Education
The development of web interfaces, database backend development and open APIs resulted in documentation and educational outcomes that have supported other projects, and provided training to a broad range of researchers at multiple venues, including two Geological Society of America early-career workshops, training for multiple graduate and post-graduate students, and informal code sharing among projects working with user authentication and graph databases. Throughput resulted in clear mappings between W3C standards for annotations to graph database implementations (https://github.com/throughput-ec/w3c-alignment), clearly coded workflows to search for instances of data resource use in GitHub repositories (https://github.com/throughput-ec/github_scrapers), and documentation for the development of single page applications using Vue.js (https://github.com/throughput-ec/throughput_vue). In addition, Throughput worked with EarthCube to produce online workshop materials (https://github.com/throughput-ec/ec_workshops_py) that have continued to be delivered, providing a web-based platform for teaching programming to students distributed around the world.
Legacy
The Throughput Database of linked projects, publications and data repositories no longer has supported infrastructure to maintain an online web presence, but the database itself is available for public download. All code repositories are hosted on GitHub as public data resources with permissive licenses and will be available in perpetuity.
Last Modified: 02/16/2024
Modified by: Simon Goring
Please report errors in award information by writing to: awardsearch@nsf.gov.