
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | August 30, 2017 |
Latest Amendment Date: | August 3, 2021 |
Award Number: | 1661516 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter McCartney
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | September 1, 2017 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $172,356.00 |
Total Awarded Amount to Date: | $172,356.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
300 TURNER ST NW BLACKSBURG VA US 24060-3359 (540)231-5281 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
300 Turner Street NW Blacksburg VA US 24060-0405 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
The millions of species that inhabit the planet all have distinct biological traits that enable them to successfully compete in or adapt to their ecological niches. Determining accurately how these traits evolved is thus fundamental to understanding earth's biodiversity, and to predicting how it might change in the future in response to changes in ecosystems. Although sophisticated analytical methods and tools exist for analyzing traits comparatively, applying their full power to the myriad of trait observations recorded in the form of natural language descriptions has been hindered by the difficulty of allowing these tools to understand even the most basic facts implied by an unstructured free-text statement made by a human observer. The technological arsenal needed to overcome this challenge is now in principle available, thanks to a number of recent breakthroughs in the areas of knowledge representation and machine reasoning, but these technologies are challenging enough to deploy, orchestrate, and use that the barriers to effectively exploit them remains far too high for most tools. This project will create infrastructure that will dramatically reduce this barrier, with the goal of providing comparative trait analysis tools easy access to algorithms powered by machines reasoning with and making inferences from the meaning of trait descriptions. Similar to how Google, IBM Watson, and others have enabled developers of smartphone apps to incorporate, with only a few lines of code, complex machine-learning and artificial intelligence capabilities such as sentiment analysis, this project will demonstrate how easy access to knowledge computing opens up new opportunities for analysis, tools, and research. It will do this by addressing three long-standing limitations in comparative studies of trait evolution: recombining trait data, modeling trait evolution, and generating testable hypotheses for the drivers of trait adaptation.
The treasure trove of morphological data published in the literature holds one of the keys to understanding the biodiversity of phenotypes, but exploiting the data in full through modern computational data science analytics remains severely hampered by the steep barriers to connecting the data with the accumulated body of morphological knowledge in a form that machines can readily act on. This project aims to address this barrier by creating a centralized computational infrastructure that affords comparative analysis tools the ability to compute with morphological knowledge through scalable online application programming interfaces (APIs), enabling developers of comparative analysis tools, and therefore their users, to tap into machine reasoning-powered capabilities and data with machine-actionable semantics. By shifting all the heavy-lifting to this infrastructure, tools can programmatically obtain answers to knowledge-based questions that would otherwise require careful study by a human export, such as objectively and reproducibly assessing the relatedness, independence, and distinctness of characters and character states, with only a few lines of code. To accomplish this, the project will adapt key products and know-how developed by the Phenoscape project, including an integrative knowledgebase of ontology-linked phenotype data, metrics for quantifying the semantic similarity of phenotype descriptions, and algorithms for synthesizing morphological data from published trait descriptions. To drive development of the computational infrastructure and to demonstrate its enabling value, the project's objectives focus on addressing three concrete long-standing needs for which the difficulty of computing with domain knowledge is the major impediment: (1) computationally synthesizing, calibrating, and assessing morphological trait matrices from across studies; (2) objectively and reproducibly incorporating morphological domain knowledge provided by ontologies into evolutionary models of trait evolution; and (3) generating testable hypotheses for adaptive diversification by incorporating semantic phenotypes into ancestral state reconstruction and identifying domain ontology concepts linked to evolutionary changes in a branch or clade more frequently than expected by chance. In addition, to better prepare evolutionary biologist users and developers of comparative analysis tools for adopting these new capabilities, a domain-tailored short-course on requisite knowledge representation and computational inference technologies will be developed and taught. More information on this project can be found at http://cate.phenoscape.org/.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The unique biological traits of organisms enable them to compete, coexist, and adapt to their ecological niches and determining how these traits have evolved is thus fundamental to understanding earth?s biodiversity. However, organisms adapt to their environments as integrated wholes, not piecewise independent parts. The era of big data in the form of trait databases promises to revolutionize the study of such integrated organismal traits. While a diverse array of statistical methods have been developed for studying trait evolution, the complexity of such multi-trait datasets can require researchers to make unrealistic assumptions and/or apply increasingly complex and unwieldy models. The SCATE (Semantic Comparative Analysis of Trait Evolution) project has sought to overcome these challenges by integrating biological ontologies?computable knowledge graphs of the domain knowledge about biological traits, their definitions, and their interrelationships?into our toolkit for studying the evolution of organismal form and function. These ontologies provide computable knowledge that links together traits through their interrelationships and dependencies and allows researchers to synthesize trait data across species, analyze them as integrated sets of traits in a computationally tractable way, and reconstruct how they have responded to environmental change in the past.
This project builds on the success of its predecessor, the Phenoscape project, which built an computable knowledge-base connecting evolutionary phenotypes to biological ontologies. The Phenoscape project pioneered the use of ontologies to enable data synthesis across species by enabling semantic analysis of phenotypic descriptions of species using machine reasoning. The SCATE project built upon this work beyond presence-absence characters to construct synthetic trait matrices across studies that involve other types of trait qualities beyond presence and absence, thereby greatly increasing the scope of data synthesis. The centralized computational infrastructure built by this project provides access to annotated organismal trait data and semantic reasoning services to researchers through online application interfaces. Furthermore, the project developed new methods for not only data synthesis, but integrating ontological knowledge into the structure of evolutionary models themselves. These methods have been made available in open-source software packages that enable consistent and biologically-realistic methods for modeling the evolution of entire organismal anatomies, and bypass many of the complexities and assumptions limiting previous methods for analyzing multi-trait datasets. These methods can be used to reconstruct ancestral organismal anatomies, detect enrichment of certain suites of traits at different points in evolutionary history, and correlate these changes to potential environmental factors driving such change. Furthermore, these models provide new links toward integrating data and knowledge from developmental biology and genetic studies in model organisms. These methods were widely shared with the research community in the form of scientific publications, software, websites, short courses, and package tutorials.
Just as evolutionary history allows researchers to make sense of biodiversity across species, the project demonstrated that biological ontologies structure evolutionary patterns across traits. Their use in comparative analyses has opened a new set of hypotheses about the relationship between the evolution of organismal form and the environment. Previously, such biological knowledge was held largely within the minds of experts--the complex definitions, dependencies, and interrelationships among traits. Such dependency on human expertise is a massive scientific bottleneck. By formalizing expert knowledge into computable infrastructure with ontologies connected to biodiversity trait data and developing new methods for their analysis, the SCATE project has provided a road map to hasten the pace of scientific discovery in understanding the evolution of organismal diversity.
Last Modified: 12/30/2022
Modified by: Josef C Uyeda
Please report errors in award information by writing to: awardsearch@nsf.gov.