
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | April 9, 2012 |
Latest Amendment Date: | May 23, 2014 |
Award Number: | 1146256 |
Award Instrument: | Continuing Grant |
Program Manager: |
Peter McCartney
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | September 1, 2012 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $344,919.00 |
Total Awarded Amount to Date: | $344,919.00 |
Funds Obligated to Date: |
FY 2013 = $121,357.00 FY 2014 = $90,442.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
105 JESSUP HALL IOWA CITY IA US 52242-1316 (319)335-2123 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
101F MacLean Hall Iowa City IA US 52242-1320 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
Collaborative grants have been awarded to the University of Maryland, the University of Iowa and St. Bonaventure University to develop a methodology that exploits the wealth of annotation knowledge, notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes. Motivated by the availability of rich and as yet insufficiently tapped collections of gene annotations, the project aims to facilitate the discovery of hidden knowledge that could be the basis of further scientific research. The methodology will extract patterns of interest from annotation graphs (pattern discovery). Literature-based methods will extract sentences that validate the biological meaning underlying these patterns (pattern validation). To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph summarization (GS) will be developed. Algorithms to mine the literature for relevant sentences for an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative exploration and will incorporate allied steps such as consulting gene function prediction. The project will involve collaboration with biologists for building and refining annotation graphs, and validating patterns to ensure relevance to their research.
The project makes broad contributions to the Arabidopsis thaliana community. PattArAn may assist Arabidopsis curators to manage GO-PO annotations and complement existing tools such as Textpresso and AraNet. It can also be used to bootstrap an annotation database for other plant species given that their genome sequence information is available. The project offers significant research and educational experiences for graduate students (University of Maryland and Iowa) and undergraduate students (St. Bonaventure University). Team members will continue to mentor women and students from under-represented communities, participate in outreach activities, lead a Journal Club, etc. The outcomes from this research project will be disseminated via biology and bioinformatics venues. More information may be obtained at the project website: https://wiki.umiacs.umd.edu/clip/pattaran/.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The aim of this project was to develop a toolkit called PatttArAn that would help bioscientists in their research. We developed key PatttArAn components pertaining to efficiently scanning large collections of the published biomedical research publications as represented in MEDLINE and in PubMed Central. We developed sentence retrieval algorithms and implemented them in a prototype system called Ferret. This system supports scientific research by allowing scientists to quickly peruse - at the sentence level - previous findings in order to find confirmatory evidence for an idea or for explanations for unexpected findings or even come up with new ideas. Ferret presents the retrieved sentences in a grid like structure called a heat map which is commonly seen in bioscience papers. Among the merits of the project we observe that Ferret and its underlying mechanisms remove some of the 'silo' effect in the progress of science. Researchers can efficiently scan publications about species that are outside of their domain of interest. Ferret allows the execution of several hundred searches in a single job and in this sense may be viewed as a biomedical search engine on 'steroids'. The value of Ferret has been demonstrated in a number of live biomedical research case studies on the species: Arabidopsis Thaliana and on Spyrogyra. In each case the end user scientist was able to interact with the simple heat map interface to select sentences and from there documents to read. As one example of a successful case, Ferret processed a set of 300+ genes in a single job and was able to find sentences that confirmed the user's scientific findings in prior research. This is one of the cases that have been acknowledged in recent papers. In formal comparisons with competing systems Ferret performed remarkably well and demonstrated particular strengths in being species agnostic. The project has been the basis of direct training for at least nine students; this number includes women and high school students besides graduate and undergraduate students. Our algorithms are also introduced to undergraduate and graduate students via projects in courses such as health data analysis courses. The contributions made by this project help increase the pace of research in the biosciences and in this regard they also contribute to the broader well-being of our societies. The sentence retrieval algorithms and data are available to other researchers.
Last Modified: 11/15/2017
Modified by: Padmini Srinivasan
Please report errors in award information by writing to: awardsearch@nsf.gov.