Award Abstract # 1540994
EarthCubeIA: Collaborative Proposal: Building Interoperable Cyberinfrastructure (CI) at the Interface between Paleogeoinformatics and Bioinformatics

NSF Org: RISE
Integrative and Collaborative Education and Research (ICER)
Recipient: KENT STATE UNIVERSITY
Initial Amendment Date: July 30, 2015
Latest Amendment Date: July 30, 2015
Award Number: 1540994
Award Instrument: Standard Grant
Program Manager: Eva Zanzerkia
RISE
 Integrative and Collaborative Education and Research (ICER)
GEO
 Directorate for Geosciences
Start Date: September 1, 2015
End Date: August 31, 2017 (Estimated)
Total Intended Award Amount: $78,000.00
Total Awarded Amount to Date: $78,000.00
Funds Obligated to Date: FY 2015 = $78,000.00
History of Investigator:
  • Alison Smith (Principal Investigator)
    alisonjs@kent.edu
Recipient Sponsored Research Office: Kent State University
1500 HORNING RD
KENT
OH  US  44242-0001
(330)672-2070
Sponsor Congressional District: 14
Primary Place of Performance: Kent State University
OH  US  44242-0001
Primary Place of Performance
Congressional District:
14
Unique Entity Identifier (UEI): KXNVA7JCC5K6
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

Paleontologists provide data about the past distribution and diversity of life. These data are useful both to geologists, because they can help determine the age of rocks, reconstruct past environments, and constrain models of the Earth system; and to biologists interested in the evolutionary history of organisms and the behavior of ecological systems during past global changes. Currently, data about fossils are dispersed across thousands of scientific publications, and dozens of small to large databases, only some of which are publicly available via the Internet. Even publicly available databases can be difficult to access because each stores different kinds of data with different conventions, requiring researchers to individually harmonize searches and their outputs. This project brings together six paleobiological databases so that they share a single set of Internet-based commands by which researchers and the public can easily access fossil records from all of Earth history. By coordinating with other emerging efforts in geological and biological data sharing, best practices, and protocols, we ensure that data will be freely available to all, enabling new scientific syntheses and discovery, more powerful educational opportunities, and general exploration of the history of life on Earth.

The paleobiological sciences sit at the nexus between geosciences and the biosciences, with close interdependencies in both domains. Within the geosciences, information about the past spatiotemporal distribution of organisms, species, and assemblages of species is essential to a wide array of allied disciplines: to sedimentologists and economic geologists studying facies relationships and employing biostratigraphic controls for correlating rock strata, to structural geologists and geophysicists seeking biogeographic constraints on reconstructions of former tectonic plate positions, to paleoclimatologists extracting paleoclimatic signals from paleoecological data, and to earth system modelers seeking to understand how biospheric dynamics have shaped, and continue to shape, the history of the Earth-Life system. Within the biosciences, the fossil record is essential for understanding how contemporary ecological systems are shaped by historical legacies of slow-acting processes, for testing climate-driven models of species distribution and diversity that are being used to project the impacts of 21st century climate change, for constraining phylogenetic models of species divergence and rates of evolution, and for understanding the fundamental drivers of biodiversity (i.e. species extinctions and originations). In an era of global change, when stewarding biodiversity is an urgent societal concern, conservation biologists, global change ecologists, and earth system scientists are all looking to the past to study the behavior of the Earth-Life system during rapid transitions. Paleobiological data are currently served by a wide array of databases that vary in structure, composition, temporal scales, types of data and metadata. To conduct ?global? or holistic analyses of the paleobiological record it is necessary to retrieve data from a variety of these databases - requiring queries of each database to retrieve the types of data needed. The purpose of this project is to make six different paleobiological databases interoperable so that they can be accessed via a common Application Programming Interface (API) to query the data from these and other databases. Towards that end, five key records of North American Pleistocene lakes will be uploaded and become available through this integrative project. This project also will increase the interoperability between these paleobiological resources and contemporary databases of species distributions and diversity, enabling continuous time-series analyses (e.g., of biodiversity) from the beginning of life on earth to today. Integration of the paleobiological databases with databases of the stratigraphic record (Macrostrat) will enhance the value of both types of data. New R packages will facilitate retrieval and analysis of data from all of the databases. Finally, this proposal establishes a Paleobiological Data Consortium, consisting of leaders of cyberinfrastructure resources in the paleobiosciences and allied disciplines, with the goal of sharing best practices and protocols among the geoinformatic and bioinformatic communities.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Project outcomes Report for the General Public:

EarthCubeIA: Collaborative Proposal: Building Interoperable Cyberinfrastructure (CI) at the Interface between Paleogeoinformatics and Bioinformatics

Here, I report on my portion of a very large collaborative project to develop interoperability among high-use open access paleoecological databases, including the Neotoma Database  (www.neotomadb.org) and the Paleobiology Database (PBDB) (https://training.paleobiodb.org/#/).  An overview of the entire project can be found at the Earth-Life Consortium (www.earthlifeconsortium.org ). 

 The portion of the project reported here involves the Neotoma Database.  Neotoma is an open access, community-curated data repository of paleoecological data covering the most recent 3 million years of Earth’s history.  The framework for the database was developed with NSF funding beginning in 2009, but the data have been gathered over many decades by many scientists.  Currently, the database is dominated by continental samples, although some marine samples exist.  There are at present housed within the database 3.8 million observations, >17,000 datasets, and >9,200 sites.  The data consist of fossil pollen, vertebrates, diatoms, ostracodes, macroinvertebrates, plant macrofossils, insects, testate amoebae, geochronological data, and the recently added organic biomarkers, stable isotopes, and specimen-level data.  Considering the entire project, of which this portion is just one piece, these paleobiological data provide the fourth dimension (Time)  towards understanding the processes that govern the diversity and distribution of life.  These data are increasingly needed by specialists in many areas to better understand how events and processes occurring today impact biota, where they can live, and how their communities can change. 

 

This component of the  project involved preparing, analyzing and uploading to the database sediment cores from lake deposits containing fossil records of ostracodes (aquatic microcrustaceans used in environmental and hydrologic reconstructions).  The fossil data from five key North American Pliocene to Pleistocene sites have been prepared, analyzed and uploaded, ranging from 3.3 million years to 130,000 years in age:  Butte Valley, CA; Owens Lake, CA; Lake Bonneville/Great Salt Lake, UT; Bear Lake, UT/ID; Raymond Basin, IL.  In addition, ostracode records from  late Pleistocene to Holocene cores from 3 additional sites have also been uploaded:  Crystal Lake, IL, Elk Lake, MN, and Kenosee Lake, Saskatchewan. These cores contain records important to understanding species distribution and diversity during times of rapid climate transition, and also the hydrologic response to climate forcing.

 

Thus, eight sites with multiple cores covering Pliocene through Holocene records are now available with ostracode data, appropriate age models, and taxonomic links to allow for biogeographic, paleoenvironmental and paleohydrologic uses.  These cores are very important for understanding past surface water changes in the West, an important source of information for future planning as well as for biogeographic and environmental use.

 

Broader Impacts:  These funds supported one undergraduate Geology major at Kent State University who began training in the lab in Spring 2016 and then became a graduate student, working towards the M.S. in Geology.  He has been trained in paleolimnology, sample preparation, ostracode identification, computer programming in R, how to search and harvest the Neotoma database by developing R programs and application programming interface tools “API”s, and interpreting paleolimnologic data.   In addition, the student was able to present research results at two professional meetings, and to assist with a Neotoma database workshop for international scientists at a professional meeting.

 

Data described above are now publically accessible through the Neotoma database (www.neotomadb.org), and research projects have been presented through published abstracts and meeting presentations.  Also, a database workshop allowed for expansion of the use of these data by international scientists.

 

            Intellectual Merit:  Considering this piece of the project, we now have for the first time a public access source (Neotomadb.org) of the ostracode fossil record for major western U.S. paleolimnologic sites that can be used to correlate and explore environmental, ecological, hydrological and climatic shifts in the region over the past few million years.  These data provide a critical framework for understanding the response of surface water on the landscape to paleoenvironmental/paleoclimatic shifts, and help us to better prepare for future changes in the western U.S.


Last Modified: 11/27/2017
Modified by: Alison J Smith

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page