Skip to feedback

Award Abstract # 1541029
EarthCube IA: Collaborative Proposal: LinkedEarth: Crowdsourcing Data Curation & Standards Development in Paleoclimatology

NSF Org: RISE
Integrative and Collaborative Education and Research (ICER)
Recipient: UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date: July 28, 2015
Latest Amendment Date: July 28, 2015
Award Number: 1541029
Award Instrument: Standard Grant
Program Manager: Eva Zanzerkia
RISE
 Integrative and Collaborative Education and Research (ICER)
GEO
 Directorate for Geosciences
Start Date: September 1, 2015
End Date: August 31, 2019 (Estimated)
Total Intended Award Amount: $684,779.00
Total Awarded Amount to Date: $684,779.00
Funds Obligated to Date: FY 2015 = $684,779.00
History of Investigator:
  • Julien Emile-Geay (Principal Investigator)
    julieneg@usc.edu
  • Yolanda Gil (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
(213)740-7762
Sponsor Congressional District: 34
Primary Place of Performance: University of Southern California
3651 Trousdale Pkwy, ZHS 117
Los Angeles
CA  US  90089-0740
Primary Place of Performance
Congressional District:
37
Unique Entity Identifier (UEI): G88KLJR3KYT5
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

Natural climate variability signficantly modulates anthropogenic global warming, and only paleoclimate observations can adequately constrain it. Moreover, such observations are most powerful when many records are brought together to provide a spatial understanding of past variability. However, there is currently no universal way to share paleoclimate data between users or machines, hindering integration and synthesis. Large-scale, international, paleoclimate data syntheses have a long and successful history, but have been needlessly labor-intensive. Recognizing that (1) paleoclimate data curation requires expert knowledge; (2) top-down data management approaches are ineffectual; (3) existing infrastructure does not foster standardization; there emerges a critical need for a flexible platform enabling crowdsourced data curation and standards development.The platform will be combined with editorial and community-driven processes which will result in a system that has the potential to engage a broad user base in geoscientific data curation. The proposed framework will lower barriers to participation in the geosciences, enabling more "dark data" to join the public domain using community-sanctioned protocols. The pilot project will facilitate the work of hundreds of paleoclimate scientists, accelerating scientific discovery and the dissemination of its results to society.

Semantic wikis provide a simple, intuitive interface to semantic languages and infrastructure that build on open Web architecture. Like traditional wikis, they enable the collaborative authoring of content. Secure access and time-stamped content also enable the tracking of changes and the accountability of users, as well as moderation capabilities by community members of recognized expertise. In contrast to traditional wikis, semantic wikis allow contributors to assign meaning to their content, specifying relationships between the objects they describe. This enables artificial intelligence reasoners to parse, process and translate these data into more useful forms. The technology is well-proven, scalable, and completely transparent to the user, requiring no computer science knowledge or more sophisticated technology than a web browser. The LinkedEarth Wiki will automatically translate this information into Linked Open Data, a universal format to share data across the Web. To demonstrate this concept?s broad applicability across paleoclimate science, the project?s target community is the PAGES2k consortium, an international collaboration dedicated to the climate of the Common Era. Social technologies will be developed to power collective curation, standards development and quality control by the community itself. The project will demonstrate applicability to other paleogeosciences, serving as a potential template for other geoscientific disciplines.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Emile-Geay, J., D. Khider, N. McKay, Y. Gil, D. Garijo, and V. Ratnakar "LinkedEarth: supporting paleoclimate data standards and crowd curation" Past Global Change Magazine , v.26 , 2018 , p.62 10.22498/pages.26.2.62
Khider, D., J. Emile-Geay, N. P. McKay, Y. Gil, D. Garijo, V. Ratnakar, M. Alonso-Garcia, S. Bertrand,O. Bothe, P. Brewer, A. Bunn, M. Chevalier, L. Comas-Bru, A. Csank, E. Dassie, K. DeLong, T. Felis,P. Francus, A. Frappier, W. Gray, S. Goring, L. Jonk "PaCTS 1.0: A Crowdsourced Reporting Standard for Paleoclimate Data" Paleoceanography and Paleoclimatology banner , v.34 , 2019 10.1029/2019PA003632
PAGES 2k Consortium (Emile-Geay, J., McKay, N., Kaufman, D., von Gunten, L., Wang, J., Anchukaitis, K., Abram, N., Addison, J., Curran, M., Evans, M., Henley, B., Hao, Z., Martrat, B., McGregor, H., Neukom , R., Pederson, G., Stenni, B., Thirumalai, K., W "A global multiproxy database for temperature reconstructions of the Common Era" Scientific Data , 2017 , p.170,088 10.1038/sdata.2017.88
PAGES2k consortium (Emile-Geay, McKay leads) "A globalmultiproxy database for temperaturereconstructions of the Common Era" Scientific Data (Nature Publishing Group) , v.4 , 2017 , p.170088 10.1038/sdata.2017.88
Yolanda Gil, Daniel Garijo, Varun Ratnakar, Deborah Khider, Julien Emile-Geay, Nicholas McKay "A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations" Proceedings of the Sixteenth International Semantic Web Conference (ISWC), Vienna, Austria , 2017 , p.231 10.1007/978-3-319-68204-4_24

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The LinkedEarth project (Fig 1) has moved paleoclimatology forward into the digital age, promoting the concept of FAIR data (Findable, Accessible, Interoperable, Reproducible). LinkedEarth has done so using two main avenues:  (1) it has initiated or furthered standards to store, publish, and exchange paleoclimate data in digital form. (2) it has advanced the notion of crowdsourced paleoclimate content, allowing data generators to curate their own and others' data, using open standards and technologies. 


The basic premise of LinkedEarth is that data generators are best positioned to describe and digitally publish their data. Therefore, they should be given more agency in this process, and given tools and guidelines to make their data interoperable. We thus built a platform (http://wiki.linked.earth) that would enable paleoclimatologists to interact with data in an intuitive way, resulting in standardized datasets that are (by construction) extensible, interoperable, and discoverable. LinkedEarth accomplished this by developing data standards, which entails three things: (1) a standard terminology, to prevent ambiguity; (2) standard practices, which codify the information that is essential to long-term reuse and (3) a standard format for archival and exchange. The latter is emerging, in the form of Linked Paleo Data (LiPD; Fig 2), so LinkedEarth only had to contend with the first two parts. 

Standardizing terminology was accomplished by means of an ontology (http://linked.earth/ontology/). An ontology is a formal representation of the knowledge common to a scholarly field. Here, it allows unambiguous definitions of common terms describing a paleoclimate dataset, as well as the relationships among these terms. Ontologies are necessary to organize information so machines can take advantage of digitally-archived data. Ontologies need to be sufficiently rigid so that dependent applications can rely on their structure being stable over time, yet sufficiently flexible to accommodate growth and evolution. Our ontology maps closely to LiPD, which serves as its data model. Extensibility was achieved via a new technology, the LinkedEarth platform (Gil et al, 2017). At its core, it is a semantic wiki, similar to other wikis like Wikipedia, but based on the LinkedEarth ontology. The LinkedEarth wiki tracks changes and attributes them to authenticated contributors. The wiki facilitates extensions by allowing users to edit the non-core aspects of the ontology: they can define new classes or properties, create or change definitions, start discussions with other users, or request modifications to the core ontology when sufficient consensus emerges. This flexibility is essential to accommodate advances in techniques and interpretations, and allow users to deprecate outdated terms.

Standardizing practices was accomplished via a community activity, resulting in a reporting standard (PaCTS; Khider et al 2019: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019PA003632). This consensus-building enterprise was facilitated by the LinkedEarth platform, including working groups, discussions, and polling. PaCTS is now being considered for adoption by major journals in the field. 

Because LinkedEarth datasets are based on LiPD, they benefit from the entire LiPD research ecosystem (Fig 2). This makes LinkedEarth-hosted data inherently interoperable. Lastly, the semantic part of LinkedEarth means that datasets are broadcast to the web using standard schemas, which make them discoverable by various search engines, including Google. While the LinkedEarth platform contains many hundred datasets, it was never meant to compete with much larger effors like NOAA’s WDS-Paleo or PANGAEA. Instead, LinkedEarth has served as an ideas lab to incubate standards technologies. It is hoped that further cyber-paleo efforts, particularly EarthCube, will re-use and re-prupose these resources to further serve the paleosciences, and better connect them to the rest of the geosciences. 



 


Last Modified: 11/27/2019
Modified by: Julien Emile-Geay

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page