Award Abstract # 1062542
Collaborative research: ABI Development: Ontology-enabled reasoning across phenotypes from evolution and model organisms

NSF Org: DBI
Division of Biological Infrastructure
Recipient: THE UNIVERSITY OF SOUTH DAKOTA
Initial Amendment Date: June 16, 2011
Latest Amendment Date: January 26, 2018
Award Number: 1062542
Award Instrument: Continuing Grant
Program Manager: Jen Weller
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: July 1, 2011
End Date: June 30, 2018 (Estimated)
Total Intended Award Amount: $1,817,667.00
Total Awarded Amount to Date: $1,947,267.00
Funds Obligated to Date: FY 2011 = $490,399.00
FY 2012 = $561,932.00

FY 2013 = $472,600.00

FY 2014 = $326,126.00

FY 2015 = $96,210.00
History of Investigator:
  • Paula Mabee (Principal Investigator)
    mabee@battelleecology.org
  • Paul Sereno (Co-Principal Investigator)
  • Monte Westerfield (Co-Principal Investigator)
  • David Blackburn (Co-Principal Investigator)
  • Wasila Dahdul (Co-Principal Investigator)
  • Wasila Dahdul (Former Principal Investigator)
Recipient Sponsored Research Office: University of South Dakota Main Campus
414 E CLARK ST
VERMILLION
SD  US  57069-2307
(605)677-5370
Sponsor Congressional District: 00
Primary Place of Performance: University of South Dakota Main Campus
414 E CLARK ST
VERMILLION
SD  US  57069-2307
Primary Place of Performance
Congressional District:
00
Unique Entity Identifier (UEI): U9EDNSCHTBE7
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS,
Unallocated Program Costs
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
01001213DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165, 1228, 7433, 9150, 9178, 9179
Program Element Code(s): 116500, 919900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Collaborative grants are awarded to the University of South Dakota and the University of North Carolina to develop ontology-driven tools for machine reasoning over large volumes of phenotype data. Human-readable descriptions of "phenotypic" properties such as anatomy and behavior are not well-suited to computational analysis. Yet, in evolutionary biology, genetics and development, computational assistance is necessary to discover patterns within the enormous volumes of descriptive phenotype data that are being reported in the literature and in online databases. Ontologies are structured, controlled vocabularies that can be applied to collections of descriptive data to permit logical reasoning to be used. Using the evolutionary transition from fins to limbs as a test system, this project will develop ontologically-aware software that allows users to discover similar sets of phenotypes for different taxa or mutant genes within large and diverse datasets. A fast semantic similarity engine will be developed to allow searches for evolutionary transitions and mutant genes characterized by similar phenotypic profiles. An ontological framework for reasoning over homology will be developed to allow rigorous reasoning over evolutionary diverse lineages. Natural language processing tools will be developed to improve upon the efficiency of mining phenotype data from the literature and improving data consistency. This suite of tools will be tested on a large number of skeletal phenotypes from diverse fossil and modern vertebrates. Taxonomic and anatomical ontologies for vertebrates will be augmented and hypotheses of anatomical homology formally encoded. The ontologies and software tools, together with phenotypes extracted from the vertebrate systematic literature, will be integrated in the knowledgebase with genetic and phenotype data from three vertebrate model organisms: zebrafish (Danio rerio), African clawed frog (Xenopus laevis), and mouse (Mus musculus). The knowledge base will be exposed to generic reasoners using semantic web standards. The system will be validated by its success in retrieving candidate genes for the well-studied vertebrate fin-limb transition and other major events in skeletal evolution.

The evolutionary breadth of the test data requires the development of a rigorous framework for reasoning over hypotheses of homology. Another goal is to develop and evaluate natural language processing tools for efficiently capturing ontological descriptions of phenotype from the descriptions available in the published literature. The suite of tools will be validated by recovering developmental genetic pathways that underlie the evolutionary transition from fin to limb in vertebrates, and refined by iterative testing with domain bioinformaticians on the project and biologists from the broader user community.


A broad community of users will participate through the lifecycle of this project in the development of community standards and resources for the interoperability and computability of phenotypic knowledge. This will be achieved through workshops, usability testing sessions, and coordination with key research networks. Stakeholder ownership will be enhanced by rapid and open release of a variety of products that we anticipate to be of immediate and enduring value to the greater biology community, including tools for streamlining data curation and performing large-scale semantic similarity searches, high quality vertebrate taxonomy and anatomy ontologies, and standards for reasoning over homology. We will provide a unique training environment for students, postdocs and summer interns, including Native Americans through outreach at the University of South Dakota and minority and female students though a collaboration with Project Exploration at the University of Chicago. Project progress and outcomes will be disseminated through both traditional and online outlets for scholarly communication (including blog posts and mailing lists); the primary web presence will be at https://www.phenoscape.org/wiki/.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 87)
Arighi, C, et al. (including Dahdul, Mabee, Cui, Haendel, Balhoff) "An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task" Database , 2013 10.1093/database/bas056
Arighi, C, et al. (including Dahdul, W, Mabee, P, Cui, H, Haendel, M, Balhoff, J) "An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task" Database , v.2013 , 2013 http://dx.doi.org/10.1093/database/bas056
Arighi, C, et al. (including W. Dahdul, P. Mabee, H. Cui, M. Haendel, J. Balhoff) "An overview of the BioCreative 2012 Workshop Track III: interactive text mining task." Database , v.2013 , 2013 , p.online 10.1093/database/bas056
Balhoff, JP "Scowl: a Scala DSL for programming with the OWL API" The Journal of Open Source Software , 2016 doi:10.21105/joss.00023
Balhoff, JP "The Phenoscape Knowledgebase: tools and APIs for computing across phenotypes from evolutionary diversity and model organisms" Proceedings of the International Conference on Biological Ontology , 2016 doi:10.1101/071951
Balhoff, JP "The Phenoscape Knowledgebase: tools and APIs for computing across phenotypes from evolutionary diversity and model organisms" Proceedings of the International Conference on Biological Ontology 2016 , 2016 doi:10.1101/071951
Balhoff, JP, Dahdul, WM, Dececchi, TA, Lapp, H, Mabee, PM, and Vision, TJ "Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex" Journal of Biomedical Semantics , v.5 , 2014 , p.45 http://dx.doi.org/10.1186/2041-1
Balhoff, JP, Dahdul, WM, Kothari, CR, Lapp, H, Lundberg, JG, Mabee, PM, Midford, PE, Westerfield, M, and Vision, TJ "Phenex: Ontological annotation of phenotypic diversity" PLoS ONE , v.5 , 2010 , p.e10500 10.1371/journal.pone.0010500
Balhoff, J.P., Dececchi, T.A., Mabee, P.M., Lapp, H. "Presence-absence reasoning for evolutionary phenotypes" Bio-ontologies SIG at ISMB , 2014 http://arxiv.org/abs/1410.3862
Balhoff, JP, Dececchi, TA, Mabee, PM, Lapp, H "Presence-absence reasoning for evolutionary phenotypes" Bio-ontologies SIG at ISMB , 2014 http://arxiv.org/abs/1410.3862
Balhoff, JP, I Mikó, MJ Yoder, PL Mullins, and AR Deans "A semantic model for species description, applied to the ensign wasps (Hymenoptera: Evaniidae) of New Caledonia" Systematic Biology , v.62 , 2013 , p.639 http://dx.doi.org/10.1093/sysbio/syt028
(Showing: 1 - 10 of 87)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The study of how the observable features of organisms, i.e., their phenotypes, result from the complex interplay between genetics, development, and the environment, is central to research in biology. Differences in the way that phenotypes of biodiverse species are described in text, however, has meant that it is difficult to use computer-based methods for analysis. Further, it has meant that researchers from different areas such as genetics cannot readily access or use these data in combination with their own. The overall aim of the Phenoscape initiative (www.phenoscape.org) is to develop the capacity for large-scale computational analysis of those phenotypes that distinguish species in nature. The focus, and intellectual merit, of this specific project has been to develop the capacity for large-scale analysis of phenotypes using a particular test case, the fin-to-limb transition, a time of major structural change as the first terrestrial vertebrates evolved from their aquatic ancestors. This was chosen owing to the large literature on both the phenotypic changes observed in the fossil record and on genes that may have been involved in those changes.

This project developed broadly applicable methods to mark-up text-based phenotypes with ontologies (standardized terms with built-in logic) so that they could be computed upon and then analyzed for their similarities and differences. The research increased the speed and scalability of computation resulting in rigorous, and user-friendly software that uses a set of phenotypes for a particular organism or genotype and retrieves statistically similar sets of phenotypes. The software integrates the logical structure of the ontologies along with statements of homology, phenotypes of biodiverse organisms, and phenotypes and associated genes of biomedical models in the Phenoscape Knowledgebase. This knowledgebase integrates data on over 5,000 vertebrate taxa and 20,000 evolutionary characters with hundreds of thousands of phenotypes from genes and gene expression from three vertebrate model organisms (zebrafish (Danio rerio), frog (Xenopus laevis), and mouse (Mus musculus)) and human. Experiments were conducted to determine how best to combine information about homology, or the descent of anatomical structures from a common ancestor, with ontology-annotated phenotype data. Software was developed to link the phenotypes from different species of vertebrate animals to the phenotypes and genes from mutant model organisms and vice versa. A user of the knowledgebase can thus compute the genes that might be involved in the evolutionary changes that produced the differences between species. As a capstone, an experiment was conducted to retrieve candidate genes from the knowledgebase for the vertebrate fin-to-limb transition based on the phenotypes of the relevant fossil taxa. The system successfully recovered most of the set of candidate genes proposed in the literature, while at the same time suggesting refinements to the expert predictions. Because phenotype text mark-up is time-consuming for humans and thus a major limitation in scaling up the approach, the project developed the first expert-curated dataset to be used to as a benchmark or ‘gold standard’ in assessing the efficacy of machine-based phenotypes. Project assessments using the benchmark point toward ways to better design software to assist human curators, and the use of the gold standard phenotype set will allow training and assessment of new tools to improve phenotype mark-up accuracy at scale. Finally, the project developed tools to automatically combine independently published data sets into large matrices of presence/absence phenotypes for taxa. The tools used the logic of ontologies to infer the presence or absence of a phenotype, and thus made available significant new data for biological research.

This research is one of the first to demonstrate the feasibility and utility of ontologies for data integration and knowledge discovery in evolutionary biology as well as the limitations and needs. The knowledgebase, ontologies, and over 30 publications with associated datasets and software are openly available for reuse by the scientific community. Contributions made by this project to community anatomy, quality, taxonomy, evidence ontologies and related resources are being leveraged by the broader biological and biomedical communities. Additional broader impacts from this project include training of undergraduate students, graduate students and postdocs at multiple U.S. institutions in the areas of biocuration, software development, semantic methods and support for programmers and curators at several institutions. Broader impacts also include several workshops and symposia on these emerging techniques for a research audience. The project developed partnerships with over 15 other organizations and institutions.

 

 


Last Modified: 09/13/2018
Modified by: Paula M Mabee

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page