Award Abstract # 1146256
COLLABORATIVE: ABI Development: Methodology for Pattern Creation, Imprint Validation, and Discovery from the Annotated Biological Web

NSF Org: DBI
Division of Biological Infrastructure
Recipient: THE UNIVERSITY OF IOWA
Initial Amendment Date: April 9, 2012
Latest Amendment Date: May 23, 2014
Award Number: 1146256
Award Instrument: Continuing Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: September 1, 2012
End Date: August 31, 2017 (Estimated)
Total Intended Award Amount: $344,919.00
Total Awarded Amount to Date: $344,919.00
Funds Obligated to Date: FY 2012 = $133,120.00
FY 2013 = $121,357.00

FY 2014 = $90,442.00
History of Investigator:
  • Padmini Srinivasan (Principal Investigator)
    padmini-srinivasan@uiowa.edu
Recipient Sponsored Research Office: University of Iowa
105 JESSUP HALL
IOWA CITY
IA  US  52242-1316
(319)335-2123
Sponsor Congressional District: 01
Primary Place of Performance: University of Iowa
101F MacLean Hall
Iowa City
IA  US  52242-1320
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): Z1H9VJS8NG16
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165, 9150, 9179
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Collaborative grants have been awarded to the University of Maryland, the University of Iowa and St. Bonaventure University to develop a methodology that exploits the wealth of annotation knowledge, notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes. Motivated by the availability of rich and as yet insufficiently tapped collections of gene annotations, the project aims to facilitate the discovery of hidden knowledge that could be the basis of further scientific research. The methodology will extract patterns of interest from annotation graphs (pattern discovery). Literature-based methods will extract sentences that validate the biological meaning underlying these patterns (pattern validation). To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph summarization (GS) will be developed. Algorithms to mine the literature for relevant sentences for an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative exploration and will incorporate allied steps such as consulting gene function prediction. The project will involve collaboration with biologists for building and refining annotation graphs, and validating patterns to ensure relevance to their research.

The project makes broad contributions to the Arabidopsis thaliana community. PattArAn may assist Arabidopsis curators to manage GO-PO annotations and complement existing tools such as Textpresso and AraNet. It can also be used to bootstrap an annotation database for other plant species given that their genome sequence information is available. The project offers significant research and educational experiences for graduate students (University of Maryland and Iowa) and undergraduate students (St. Bonaventure University). Team members will continue to mentor women and students from under-represented communities, participate in outreach activities, lead a Journal Club, etc. The outcomes from this research project will be disseminated via biology and bioinformatics venues. More information may be obtained at the project website: https://wiki.umiacs.umd.edu/clip/pattaran/.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Becker, N., Chang, C., Raschid, L., Srinivasan, P., Van de Poel, B., Zhang, X., Zotkina, E. "Mining Sentence and Annotation Evidence for a Cross Genome Study of the Plant Hormone Ethylene." Springer Lecture Notes in Computer Science , v.9162 , 2015 , p.251
Becker, N., Chang, C., Raschid, L., Srinivasan, P., Van de Poel, B., Zhang, X., Zotkina, E. "Mining Sentence and Annotation Evidence for a Cross Genome Study of the Plant Hormone Ethylene.." International Conference onData Integration in the Life Sciences.. 9162 (1), 251 , v.9162 , 2015 , p.251
Padmini Srinivasan, Xiao-Ning Zhang, Roxane Bouten and Caren Chang "Ferret: a sentence-based literature scanningsystem" BMC Bioinformatics , v.16 , 2015 , p.198 10.1186/s12859-015-0630-0
Padmini Srinivasan, Xiao-Ning Zhang, Roxane Bouten and Caren Chang3 "Ferret: a sentence-based literature scanning system" BMC Bioinformatics , v.16 , 2015 , p.198
Srinivasan, P., Zhang, X-N, Bouten, R., Chang, C. "Ferret: A sentence-based literature scanning system." BMC Bioinformatics , v.16 , 2015 , p.198

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The aim of this project was to develop a toolkit called PatttArAn that would help bioscientists in their research.  We developed key PatttArAn components pertaining to efficiently scanning large collections of the published biomedical research publications as represented in MEDLINE and in PubMed Central.  We developed sentence retrieval algorithms and implemented them in a prototype system called Ferret.  This system supports scientific research by allowing scientists to quickly peruse - at the sentence level - previous findings in order to find confirmatory evidence for an idea or for explanations for unexpected findings or even come up with new ideas.  Ferret presents the retrieved sentences in a grid like structure called a heat map which is commonly seen in bioscience papers. Among the merits of the project we observe that Ferret and its underlying mechanisms remove some of the 'silo' effect in the progress of science.  Researchers can efficiently scan publications about species that are outside of their domain of interest.  Ferret allows the execution of several hundred searches in a single job and in this sense may be viewed as a biomedical search engine on 'steroids'.  The value of Ferret has been demonstrated in a number of live biomedical research case studies on the species: Arabidopsis Thaliana and on Spyrogyra.  In each case the end user scientist was able to interact with the simple heat map interface to select sentences and from there documents to read.  As one example of a successful case, Ferret processed a set of 300+ genes in a single job and was able to find sentences that confirmed the user's scientific findings in prior research.  This is one of the cases that have been acknowledged in recent papers.  In formal comparisons with competing systems Ferret performed remarkably well and demonstrated particular strengths in being species agnostic. The project has been the basis of direct training for at least nine students; this number includes women and high school students besides graduate and undergraduate students.  Our algorithms are also introduced to undergraduate and graduate students via projects in courses such as health data analysis courses.  The contributions made by this project help increase the pace of research in the biosciences and in this regard they also contribute to the broader well-being of our societies. The sentence retrieval algorithms and data are available to other researchers.


Last Modified: 11/15/2017
Modified by: Padmini Srinivasan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page