Award Abstract # 1062404
COLLABORATIVE RESEARCH: ABI Development: Ontology-enabled reasoning across phenotypes from evolution and model organisms

NSF Org: DBI
Division of Biological Infrastructure
Recipient: UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL
Initial Amendment Date: June 16, 2011
Latest Amendment Date: December 14, 2015
Award Number: 1062404
Award Instrument: Continuing Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: July 1, 2011
End Date: June 30, 2017 (Estimated)
Total Intended Award Amount: $1,305,787.00
Total Awarded Amount to Date: $1,305,787.00
Funds Obligated to Date: FY 2011 = $351,409.00
FY 2012 = $385,469.00

FY 2013 = $281,143.00

FY 2014 = $287,766.00
History of Investigator:
  • Todd Vision (Principal Investigator)
    tjv@bio.unc.edu
  • Hilmar Lapp (Co-Principal Investigator)
Recipient Sponsored Research Office: University of North Carolina at Chapel Hill
104 AIRPORT DR STE 2200
CHAPEL HILL
NC  US  27599-5023
(919)966-3411
Sponsor Congressional District: 04
Primary Place of Performance: University of North Carolina at Chapel Hill
104 AIRPORT DR STE 2200
CHAPEL HILL
NC  US  27599-5023
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): D3LHU66KBLD5
Parent UEI: D3LHU66KBLD5
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
01001213DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165, 1228, 9178, 9179
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Collaborative grants are awarded to the University of South Dakota and the University of North Carolina to develop ontology-driven tools for machine reasoning over large volumes of phenotype data. Human-readable descriptions of "phenotypic" properties such as anatomy and behavior are not well-suited to computational analysis. Yet, in evolutionary biology, genetics and development, computational assistance is necessary to discover patterns within the enormous volumes of descriptive phenotype data that are being reported in the literature and in online databases. Ontologies are structured, controlled vocabularies that can be applied to collections of descriptive data to permit logical reasoning to be used. Using the evolutionary transition from fins to limbs as a test system, this project will develop ontologically-aware software that allows users to discover similar sets of phenotypes for different taxa or mutant genes within large and diverse datasets. A fast semantic similarity engine will be developed to allow searches for evolutionary transitions and mutant genes characterized by similar phenotypic profiles. An ontological framework for reasoning over homology will be developed to allow rigorous reasoning over evolutionary diverse lineages. Natural language processing tools will be developed to improve upon the efficiency of mining phenotype data from the literature and improving data consistency. This suite of tools will be tested on a large number of skeletal phenotypes from diverse fossil and modern vertebrates. Taxonomic and anatomical ontologies for vertebrates will be augmented and hypotheses of anatomical homology formally encoded. The ontologies and software tools, together with phenotypes extracted from the vertebrate systematic literature, will be integrated in the knowledgebase with genetic and phenotype data from three vertebrate model organisms: zebrafish (Danio rerio), African clawed frog (Xenopus laevis), and mouse (Mus musculus). The knowledge base will be exposed to generic reasoners using semantic web standards. The system will be validated by its success in retrieving candidate genes for the well-studied vertebrate fin-limb transition and other major events in skeletal evolution.

The evolutionary breadth of the test data requires the development of a rigorous framework for reasoning over hypotheses of homology. Another goal is to develop and evaluate natural language processing tools for efficiently capturing ontological descriptions of phenotype from the descriptions available in the published literature. The suite of tools will be validated by recovering developmental genetic pathways that underlie the evolutionary transition from fin to limb in vertebrates, and refined by iterative testing with domain bioinformaticians on the project and biologists from the broader user community.


A broad community of users will participate through the lifecycle of this project in the development of community standards and resources for the interoperability and computability of phenotypic knowledge. This will be achieved through workshops, usability testing sessions, and coordination with key research networks. Stakeholder ownership will be enhanced by rapid and open release of a variety of products that we anticipate to be of immediate and enduring value to the greater biology community, including tools for streamlining data curation and performing large-scale semantic similarity searches, high quality vertebrate taxonomy and anatomy ontologies, and standards for reasoning over homology. We will provide a unique training environment for students, postdocs and summer interns, including Native Americans through outreach at the University of South Dakota and minority and female students though a collaboration with Project Exploration at the University of Chicago. Project progress and outcomes will be disseminated through both traditional and online outlets for scholarly communication (including blog posts and mailing lists); the primary web presence will be at https://www.phenoscape.org/wiki/.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 20)
Arighi CN, Carterette B, Cohen KB, Krallinger M, Wilbur WJ, Fey P, Dodson R, Cooper L, Van Slyke CE, Dahdul W, Mabee P, Li D, Harris B, Gillespie M, Jimenez S, Roberts P, Matthews L, Becker K, Drabkin H, Bello S, Licata L, Chatraryamontri A, Schaeffer ML, "An overview of the BioCreative 2012 Workshop Track III: interactive text mining task." Database , v.2013 , 2013 , p.bas056 10.1093/database/bas056
Balhoff, J., Dahdul, W., Dececchi, T., Lapp, H., Mabee, P., and Vision, T "Annotation of Phenotypic Diversity: Decoupling Data Curation and Ontology Curation Using Phenex" Journal of Biomedical Semantics , v.5 , 2014 , p.45 doi:10.1186/2041-1480-5-45
Balhoff, J.P. "Scowl: a Scala DSL for programming with the OWL API" The Journal of Open Source Software. , 2016 doi:10.21105/joss.00023
Bertone, M.A., I. Mikó, M.J. Yoder, K.C. Seltmann, J.P. Balhoff, and A.R. Deans "Matching arthropod anatomy ontologies to the Hymenoptera Anatomy Ontology: Results from a manual alignment" Database , v.2013 , 2013 , p.bas057 10.1093/database/bas057
Cui, H., Dahdul, W., Dececchi, T.A., Ibrahim, N., Mabee, P., Balhoff, J.P., Gopalakrishnan, H. "CharaParser+EQ: Performance Evaluation without Gold Standard" Proceedings of the 78th Annual Meeting of the Association for Information Science and Technology (ASIS&T) , v.51 , 2015
Dahdul W.M., Balhoff J.P, Blackburn D.C., Diehl A.D., Haendel M.A., Hall B.K., Lapp H., Lundberg J.G., Mungall C.J., Ringwald M., Segerdell E., Van Slyke C.E., Vickaryous M.K., Westerfield M., Mabee P.M. "A unified anatomy ontology of the vertebrate skeletal system." D , v.7 , 2012 , p.e51070 10.1371/journal.pone.0051070
Dahdul, W.M., H. Cui, P.M. Mabee, C.J. Mungall, D. Osumi-Sutherland, R. Walls, and M. Haendel. "Nose to tail, roots to shoots: spatial descriptors for phenotypic diversity in the Biological Spatial Ontology" Journal of Biomedical Semantics , v.5 , 2014 , p.34 doi:10.1186/2041-1480-5-34
Dahdul, W., T.A. Dececchi, N. Ibrahim, H. Lapp, and P. M. Mabee "Moving the mountain: Analysis of the effort required to transform comparative anatomy into computable anatomy." Database , 2015 , p.bav040 10.1093/database/bav040
Deans, Andrew R. AND Lewis, Suzanna E. AND Huala, Eva AND Anzaldo, Salvatore S. AND Ashburner, Michael AND Balhoff, James P. AND Blackburn, David C. AND Blake, Judith A. AND Burleigh, J. Gordon AND Chanet, Bruno AND Cooper, Laurel D. AND Courtot, Me?lanie "Finding Our Way through Phenotypes" PLOS Biology , v.13 , 2015 , p.e1002033 doi:10.1371/journal.pbio.1002033
Deans, A.R., et al., (including Balhoff, Blackburn, Cui, Dahdul, Dececchi, Haendel, Ibrahim, Lapp, Mungall, Westerfeld, Zorn, & Mabee) "Finding Our Way through Phenotypes" PLOS Biology , v.13 , 2015 , p.e1002033 http://dx.doi.org/10.1371/journal.pbio.1002033
Deans, A.R.; Yoder, M.J.; Balhoff, J.P. "Time to change how we describe biodiversity" Trends in Ecology & Evolution , v.27 , 2011 , p.78 http://dx.doi.org/10.1016/j.tree.2011.11.007
(Showing: 1 - 10 of 20)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The phenotype, or the set of observable traits present in an individual organism, is a central object of study for multiple subdisciplines of biology. Despite the importance of phenotypes, the tools for large-scale analysis of phenotype data are relatively crude compared to other datatypes, such as DNA sequences and protein structures. This is largely because phenotypes are generally reported in the form of natural language descriptions composed of specialized anatomical terms. The semantics, or meaning, of such descriptions can be rendered computable by the use of ontologies, controlled vocabularies that make explicit the logical relationships between specialized terms. This has been done with success in the genomics community to facilitate the interoperability of databases of mutant phenotypes across different model organisms. The aim of the Phenoscape initiative as a whole is to develop the capacity for large-scale computational analysis of those phenotypes that distinguish species and higher taxa in nature. The focus, and intellectural merit, of this specific project has been to develop the capacity for large-scale analysis of phenotypes for a particular test case, the fin-to-limb transition, a time of major structural change as the first terrestrial vertebrates evolved from their aquatic ancestors. This was chosen due to there being a large literature on both the phenotypic changes observed in the fossil record and on genes that may have been involved in those changes.

Outcomes of the project include a knowledge-base that integrates data on over 5,000 vertebrate taxa and 20,000 evolutionary characters with genetic and phenotype data from three vertebrate model organisms: zebrafish (Danio rerio), frog (Xenopus laevis), and mouse (Mus musculus). A fast semantic similarity engine that enables on-the-fly searching of large volumes of data for phenotypes similar to a gene or taxon of interest, was developed and implemented within the knowledgebase. Experiments were done to improve the efficiency and accuracy the rate-limiting human data curation process, resulting in improved software for ontology-annotation of natural language phenotype descriptions and a gold standard of phenotype data for testing future methodological innovations. Openly available ontologies (e.g. for taxa and anatomy terms) were developed, and existing ones were enhanced, to provide the necessary terms for data curation across the diversity of fossil and modern vertebrates. Experiments were conducted to determine how best to combine information about homology, or the descent of anatomical structures from a common ancestor, with ontology-annotated phenotype data. As a capstone, an experiment was conducted to retrieve candidate genes from the knowledgebase for the vertebrate fin-limb transition based on the phenotypes of the relevant fossil taxa. The system successfully recovered most of the set of candidate genes proposed in the literature, while at the same time suggesting refinements to the expert predictions.

The knowledgebase, ontologies, and over 30 publications with associated datasets and software are openly available for reuse by the scientific community. Broader impacts also include several workshops and symposia on these emerging techniques for a research audience; training of multiple postdocs and graduate students; and research experiences for undergraduates and public high school students.


Last Modified: 10/04/2017
Modified by: Todd J Vision

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page