
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | July 27, 2010 |
Latest Amendment Date: | December 22, 2010 |
Award Number: | 0960535 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter McCartney
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | August 1, 2010 |
End Date: | July 31, 2015 (Estimated) |
Total Intended Award Amount: | $1,640,289.00 |
Total Awarded Amount to Date: | $1,640,289.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
ADVANCES IN BIO INFORMATICS, Cross-BIO Activities |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
Harvard University is awarded a grant to develop a networked solution to enable annotation of distributed biological collection data and to share assertions about their quality or usability. Internet inquiries that are posed to multiple datasets may yield varying results depending on the suitability or quality of the targeted data. In some cases it might be possible to inquire of experts or software agents that can assist in determining the fitness for use; in other cases such experts or agents might already have recorded an assessment of the data. However, that information is not typically available to the originator of the query. The proposed system will make these value-added assertions accessible to the end users of biodiversity datasets.
The Filtered Push network uses natural science collections as a reference implementation for a cyberinfrastructure with which any community can render an expert opinion about the quality of data, and the fitness for use of a data set or a subset of records. The emergent knowledgebase of the Filtered Push network supports the ability of interested parties to get immediate or historical access to these annotations, filtered by criteria expressing constraints on their interests. The network can also provide for the automatic execution of scientific workflows triggered by expert commentary, by the introduction or discovery of new data, or by a change in scientific viewpoints. As with the annotations, the outputs of such workflows can be distributed to interested parties, software or human. Filtered Push networks therefore allow for continuous quality control by the scientific community, based on human expertise, statistical or logical machine reasoning or advances in the domain science itself. The Filtered Push project maintains a wiki at http://www.etaxonomy.org/mw/FilteredPush. This project is part of a 10-year effort to digitize and mobilize the scientific information associated with biological specimens held in U.S. research collections. The images and digitized data from this project will be integrated into the online national resource as outlined in the community strategic plan available at http://digbiocol.files.wordpress.com/2010/05/digistratplanfinaldraft.pdf.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
A typical natural-science collection specimen or observation is associated with information about who, what, where, when and how the specimen was collected and/or observed in nature. For hundreds of years, this information was captured on handwritten or typed labels associated with the specimens or in catalogs, but now these data are in high demand in digital form for use in scientific research. Digitization is typically performed by the data owners, who then freely distribute the data to relevant aggregators of information. This process inevitably generates errors, and it typically is limited in terms of the types and quantity of information that is captured (Figure 1). Consequently, data quality assessment is essential to determine the applicability, or fitness, of the data for a specific research purpose. Data quality can also be enhanced by using semi-automated tools that check for errors by comparing digitized data against authoritative sources, that suggest corrections, and that allow addition or improvements to data records based on expert opinion. A further complication is that such edits can be made by anyone, anywhere, and yet this knowledge is rarely communicated back to the data owner, especially in a useful, standardized form. The FilteredPush (FP) project has begun to address these challenges by producing online tools for improving the fitness for use of distributed data through computer analysis and annotation, and through human review of data quality annotations.
Intellectual Merit:
The FilteredPush team developed a novel suite of tools that use structured annotations to propose data edits. These tools extend those used for commenting about information on web pages. The annotations may be recorded in a variety of clients and “pushed” back to the data owners. Data curators serve as the gatekeepers (“filters”) who evaluate proposed edits to records in their database and update records accordingly. Annotations can be contributed by experts or non-experts; where multiple opinions exist, they become conversations. The concept of data annotation was embraced by the Biodiversity Information Standards (TDWG) body, which facilitated interoperability of FP with AnnoSys, a European implementation. Scientific workflow software for data quality control (Kepler-Kuration, FP-Akka) was developed to assess and recommend changes within natural-science collection datasets. These workflows include detailed provenance information that promotes transparency and reusability. The structured annotations also facilitate application of semantic web technologies to biodiversity data management.
Broader Impacts:
The FilteredPush project provides both direct and indirect support for several Thematic Collections Networks (TCNs) in the NSF’s Advancing Digitization of Biological Collections program and its coordinating hub, iDigBio. Direct support was provided to the Southwest Collections of Arthropods Network (SCAN), InvertEBase, and New England Vascular Plants (NEVP) TCNs. The support includes FP functionality for creating annotations and registering taxonomic groups of interest to experts, which is embedded in Symbiota, a community collection-management tool and data portal. Data quality reports were also produced for institutions participating in TCNs (Figure 2), which continue to use the tools. Indirect support was provided through further enhancements to Symbiota and contributions to iDigBio hackathons, webinars and workshops. Training was provided to a broad spectrum of users through hands-on workshops and presentations. The project directly involved one undergraduate, three graduate students, and two postdoctoral fellows. Two courses taught at the graduate level incorporated content on data curation workflows.
The project showed that a proposed web-annotation standard of the World Wide Web C...
Please report errors in award information by writing to: awardsearch@nsf.gov.