Award Abstract # 1661516
Collaborative Research: ABI Innovation: Enabling machine-actionable semantics for comparative analyses of trait evolution

NSF Org: DBI
Division of Biological Infrastructure
Recipient: VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY
Initial Amendment Date: August 30, 2017
Latest Amendment Date: August 3, 2021
Award Number: 1661516
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: September 1, 2017
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $172,356.00
Total Awarded Amount to Date: $172,356.00
Funds Obligated to Date: FY 2017 = $172,356.00
History of Investigator:
  • Josef Uyeda (Principal Investigator)
    juyeda@vt.edu
Recipient Sponsored Research Office: Virginia Polytechnic Institute and State University
300 TURNER ST NW
BLACKSBURG
VA  US  24060-3359
(540)231-5281
Sponsor Congressional District: 09
Primary Place of Performance: Virgina Polytechnic Institute
300 Turner Street NW
Blacksburg
VA  US  24060-0405
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): QDE5UHE5XD16
Parent UEI: X6KEFGLHSJX7
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9150
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

The millions of species that inhabit the planet all have distinct biological traits that enable them to successfully compete in or adapt to their ecological niches. Determining accurately how these traits evolved is thus fundamental to understanding earth's biodiversity, and to predicting how it might change in the future in response to changes in ecosystems. Although sophisticated analytical methods and tools exist for analyzing traits comparatively, applying their full power to the myriad of trait observations recorded in the form of natural language descriptions has been hindered by the difficulty of allowing these tools to understand even the most basic facts implied by an unstructured free-text statement made by a human observer. The technological arsenal needed to overcome this challenge is now in principle available, thanks to a number of recent breakthroughs in the areas of knowledge representation and machine reasoning, but these technologies are challenging enough to deploy, orchestrate, and use that the barriers to effectively exploit them remains far too high for most tools. This project will create infrastructure that will dramatically reduce this barrier, with the goal of providing comparative trait analysis tools easy access to algorithms powered by machines reasoning with and making inferences from the meaning of trait descriptions. Similar to how Google, IBM Watson, and others have enabled developers of smartphone apps to incorporate, with only a few lines of code, complex machine-learning and artificial intelligence capabilities such as sentiment analysis, this project will demonstrate how easy access to knowledge computing opens up new opportunities for analysis, tools, and research. It will do this by addressing three long-standing limitations in comparative studies of trait evolution: recombining trait data, modeling trait evolution, and generating testable hypotheses for the drivers of trait adaptation.

The treasure trove of morphological data published in the literature holds one of the keys to understanding the biodiversity of phenotypes, but exploiting the data in full through modern computational data science analytics remains severely hampered by the steep barriers to connecting the data with the accumulated body of morphological knowledge in a form that machines can readily act on. This project aims to address this barrier by creating a centralized computational infrastructure that affords comparative analysis tools the ability to compute with morphological knowledge through scalable online application programming interfaces (APIs), enabling developers of comparative analysis tools, and therefore their users, to tap into machine reasoning-powered capabilities and data with machine-actionable semantics. By shifting all the heavy-lifting to this infrastructure, tools can programmatically obtain answers to knowledge-based questions that would otherwise require careful study by a human export, such as objectively and reproducibly assessing the relatedness, independence, and distinctness of characters and character states, with only a few lines of code. To accomplish this, the project will adapt key products and know-how developed by the Phenoscape project, including an integrative knowledgebase of ontology-linked phenotype data, metrics for quantifying the semantic similarity of phenotype descriptions, and algorithms for synthesizing morphological data from published trait descriptions. To drive development of the computational infrastructure and to demonstrate its enabling value, the project's objectives focus on addressing three concrete long-standing needs for which the difficulty of computing with domain knowledge is the major impediment: (1) computationally synthesizing, calibrating, and assessing morphological trait matrices from across studies; (2) objectively and reproducibly incorporating morphological domain knowledge provided by ontologies into evolutionary models of trait evolution; and (3) generating testable hypotheses for adaptive diversification by incorporating semantic phenotypes into ancestral state reconstruction and identifying domain ontology concepts linked to evolutionary changes in a branch or clade more frequently than expected by chance. In addition, to better prepare evolutionary biologist users and developers of comparative analysis tools for adopting these new capabilities, a domain-tailored short-course on requisite knowledge representation and computational inference technologies will be developed and taught. More information on this project can be found at http://cate.phenoscape.org/.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Porto, Diego S and Almeida, Eduardo A and Pennell, Matthew W "Investigating Morphological Complexes Using Informational Dissonance and Bayes Factors: A Case Study in Corbiculate Bees" Systematic Biology , 2020 10.1093/sysbio/syaa059 Citation Details
Porto, Diego S. and Dahdul, Wasila M. and Lapp, Hilmar and Balhoff, James P. and Vision, Todd J. and Mabee, Paula M. and Uyeda, Josef "Assessing Bayesian Phylogenetic Information Content of Morphological Data Using Knowledge From Anatomy Ontologies" Systematic Biology , 2022 https://doi.org/10.1093/sysbio/syac022 Citation Details
Tarasov, Sergei and Mikó, István and Yoder, Matthew Jon and Uyeda, Josef C and Boudinot, Brendon "PARAMO: A Pipeline for Reconstructing Ancestral Anatomies Using Ontologies and Stochastic Mapping" Insect Systematics and Diversity , v.3 , 2019 10.1093/isd/ixz009 Citation Details
Uyeda, Josef C and Zenil-Ferguson, Rosana and Pennell, Matthew W and Matzke, Nicholas "Rethinking phylogenetic comparative methods" Systematic Biology , v.67 , 2018 10.1093/sysbio/syy031 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The unique biological traits of organisms enable them to compete, coexist, and adapt to their ecological niches and determining how these traits have evolved is thus fundamental to understanding earth?s biodiversity. However, organisms adapt to their environments as integrated wholes, not piecewise independent parts. The era of big data in the form of trait databases promises to revolutionize the study of such integrated organismal traits. While a diverse array of statistical methods have been developed for studying trait evolution, the complexity of such multi-trait datasets can require researchers to make unrealistic assumptions and/or apply increasingly complex and unwieldy models. The SCATE (Semantic Comparative Analysis of Trait Evolution) project has sought to overcome these challenges by integrating biological ontologies?computable knowledge graphs of the domain knowledge about biological traits, their definitions, and their interrelationships?into our toolkit for studying the evolution of organismal form and function. These ontologies provide computable knowledge that links together traits through their interrelationships and dependencies and allows researchers to synthesize trait data across species, analyze them as integrated sets of traits in a computationally tractable way, and reconstruct how they have responded to environmental change in the past.

This project builds on the success of its predecessor, the Phenoscape project, which built an computable knowledge-base connecting evolutionary phenotypes to biological ontologies. The Phenoscape project pioneered the use of ontologies to enable data synthesis across species by enabling semantic analysis of phenotypic descriptions of species using machine reasoning. The SCATE project built upon this work beyond presence-absence characters to construct synthetic trait matrices across studies that involve other types of trait qualities beyond presence and absence, thereby greatly increasing the scope of data synthesis. The centralized computational infrastructure built by this project provides access to annotated organismal trait data and semantic reasoning services to researchers through online application interfaces. Furthermore, the project developed new methods for not only data synthesis, but integrating ontological knowledge into the structure of evolutionary models themselves. These methods have been made available in open-source software packages that enable consistent and biologically-realistic methods for modeling the evolution of entire organismal anatomies, and bypass many of the complexities and assumptions limiting previous methods for analyzing multi-trait datasets. These methods can be used to reconstruct ancestral organismal anatomies, detect enrichment of certain suites of traits at different points in evolutionary history, and correlate these changes to potential environmental factors driving such change. Furthermore, these models provide new links toward integrating data and knowledge from developmental biology and genetic studies in model organisms. These methods were widely shared with the research community in the form of scientific publications, software, websites, short courses, and package tutorials.

Just as evolutionary history allows researchers to make sense of biodiversity across species, the project demonstrated that biological ontologies structure evolutionary patterns across traits. Their use in comparative analyses has opened a new set of hypotheses about the relationship between the evolution of organismal form and the environment. Previously, such biological knowledge was held largely within the minds of experts--the complex definitions, dependencies, and interrelationships among traits. Such dependency on human expertise is a massive scientific bottleneck. By formalizing expert knowledge into computable infrastructure with ontologies connected to biodiversity trait data and developing new methods for their analysis, the SCATE project has provided a road map to hasten the pace of scientific discovery in understanding the evolution of organismal diversity.


Last Modified: 12/30/2022
Modified by: Josef C Uyeda

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page