
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 26, 2014 |
Latest Amendment Date: | December 19, 2014 |
Award Number: | 1448821 |
Award Instrument: | Standard Grant |
Program Manager: |
Rajiv Ramnath
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2014 |
End Date: | February 29, 2016 (Estimated) |
Total Intended Award Amount: | $299,964.00 |
Total Awarded Amount to Date: | $299,964.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1111 FRANKLIN ST FL 8 OAKLAND CA US 94607-5201 (510)987-9850 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
415 20th Street, 4th Floor Oakland CA US 94612-2901 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
SciSIP-Sci of Sci Innov Policy, Software Institutes, STAR Metrics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The research community has been calling for solutions to data discovery and to more broadly capture the value of the work that is at the core of the researcher's scholarly pursuit. The University of California Curation Center at the California Digital Library will work with PLOS and DataONE to address what metrics are needed to capture the activity surrounding research data in a valid and credible way. This collaborative team will prototype a suite of metrics that track and measure data use, "data-level metrics" (DLM), which will measure the broad range of activity surrounding the reach and use of data as a research output. The project will augment the existing scholarly cyberinfrastructure, which currently is focused on journal articles, and introduce data as a valued scholarly output into the framework. Data metrics will create incentives that support data sharing and usage to increase the velocity of information dissemination across a wide range of disciplines, once the impact of the research is exposed.
The team will build a reference model for data metrics based on automatic tracking based on in-depth field analyses of data use and engagement practices. They will test mechanisms of automatic tracking of data activity and explore ways in which the dynamic data harvested can be delivered to drive data discovery as well as support reporting needs for funders and institutions. DLM data will provide a clear and growing picture of the activity around the dissemination and reach of research data. As this activity is linked to other research entities and objects in the research information ecosystem, an expansive portrait of the dynamics of data use and reuse will emerge, which can inform the community's understanding of what impact means for research data as a scholarly output. In the comprehensive ecosystem of identifiable, trackable research data, these metrics tools will become essential to data-rich science.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Making Data Count project (MDC), a collaborative effort of the California Digital Library (CDL) at the University of California, DataCite, DataONE, the National Center for Ecological Analysis and Synthesis (NCEAS) at UC Santa Barbara, and the Public Library of Science (PLOS), has developed an innovative prototypical system and service for aggregating and presenting metrics documenting the use, reach, and impact of research data.
The evolution of information technology has enabled new forms of data-driven and computational research, which continually reinforce the central importance of research data to the scholarly enterprise. Barrier-free access to relevant data is critical to that enterprise in order to facilitate replicability, maintain integrity, reduce needless duplication of effort, and promote synergistic investigation and scholarly advancement. However, more widespread data publication and reuse is dependent, in part, on the presence of appropriate incentive mechanisms to encourage widespread participation. For the scholarly literature, the traditional tools of reference and citation provide an effective means for asserting authorship, attributing credit, and measuring impact. In the MDC project we have applied similar principles and procedures to datasets, so that they now can be dealt with as first-class research outputs alongside the literature.
An initial round of surveys and focus group sessions with representative members of the research and data center communities provided valuable insights regarding current attitudes and practices with respect to data sharing, as well as identifying use cases and requirements for tracking and aggregating data citations and data reuse. This led to the development of a prototype service for data-level metrics, or DLM (https://dlm.datacite.org/), based on an extension of the open source Lagotto platform (http://mdc.lagotto.io/) originally developed by PLOS for tracking article-level metrics (ALM).
By assuring appropriate recognition for the scholarly contributions represented by research data, MDC is encouraging a beneficial culture of open data sharing and reuse, and enabling scholarly innovation and progress. While data citation and download counts currently have the most currency amongst the research community, the MDC project team felt it important also to include alternative measures of data activity, such as social references, as these are anticipated to grow in future importance as legitimate channels of scholarly communication. To do this, the DLM service tracks dataset citations and references from 14 sources, including data repositories (DataONE network), ejournal publishers and aggregators (BioMed Central, DataCite, Nature, PLOS), citation managers (CiteULike, Mendeley), and social media platforms (Facebook, Reddit, ScienceSeeker, Twitter, Wikipedia, Wordpress). Aggregating across these diverse sources provides a more complete picture of data use and impact.
As part of the MDC project, the DataONE network, managing over 150,000 datasets held in 27 independent data repositories, has enhanced its technical infrastructure to process, anonymize, and export usage metrics. These metrics are consistent with COUNTER standards, facilitating consistent, credible, and meaningful comparisons. Since much reuse of datasets is mediated through analytical systems such as iPython, Matlab, and R, DataONE statistics are also available in a form that does not filter out automated retrievals...
Please report errors in award information by writing to: awardsearch@nsf.gov.