Award Abstract # 1661529
Collaborative Research: ABI Innovation: Enabling machine-actionable semantics for comparative analyses of trait evolution

NSF Org: DBI
Division of Biological Infrastructure
Recipient: THE UNIVERSITY OF SOUTH DAKOTA
Initial Amendment Date: August 30, 2017
Latest Amendment Date: January 26, 2018
Award Number: 1661529
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: September 1, 2017
End Date: November 30, 2020 (Estimated)
Total Intended Award Amount: $336,493.00
Total Awarded Amount to Date: $336,493.00
Funds Obligated to Date: FY 2017 = $262,952.00
History of Investigator:
  • Wasila Dahdul (Principal Investigator)
    wdahdul@uci.edu
  • Paula Mabee (Co-Principal Investigator)
  • Paula Mabee (Former Principal Investigator)
  • Wasila Dahdul (Former Principal Investigator)
  • Wasila Dahdul (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of South Dakota Main Campus
414 E CLARK ST
VERMILLION
SD  US  57069-2307
(605)677-5370
Sponsor Congressional District: 00
Primary Place of Performance: University of South Dakota
SD  US  57069-2307
Primary Place of Performance
Congressional District:
00
Unique Entity Identifier (UEI): U9EDNSCHTBE7
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9150
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

The millions of species that inhabit the planet all have distinct biological traits that enable them to successfully compete in or adapt to their ecological niches. Determining accurately how these traits evolved is thus fundamental to understanding earth's biodiversity, and to predicting how it might change in the future in response to changes in ecosystems. Although sophisticated analytical methods and tools exist for analyzing traits comparatively, applying their full power to the myriad of trait observations recorded in the form of natural language descriptions has been hindered by the difficulty of allowing these tools to understand even the most basic facts implied by an unstructured free-text statement made by a human observer. The technological arsenal needed to overcome this challenge is now in principle available, thanks to a number of recent breakthroughs in the areas of knowledge representation and machine reasoning, but these technologies are challenging enough to deploy, orchestrate, and use that the barriers to effectively exploit them remains far too high for most tools. This project will create infrastructure that will dramatically reduce this barrier, with the goal of providing comparative trait analysis tools easy access to algorithms powered by machines reasoning with and making inferences from the meaning of trait descriptions. Similar to how Google, IBM Watson, and others have enabled developers of smartphone apps to incorporate, with only a few lines of code, complex machine-learning and artificial intelligence capabilities such as sentiment analysis, this project will demonstrate how easy access to knowledge computing opens up new opportunities for analysis, tools, and research. It will do this by addressing three long-standing limitations in comparative studies of trait evolution: recombining trait data, modeling trait evolution, and generating testable hypotheses for the drivers of trait adaptation.

The treasure trove of morphological data published in the literature holds one of the keys to understanding the biodiversity of phenotypes, but exploiting the data in full through modern computational data science analytics remains severely hampered by the steep barriers to connecting the data with the accumulated body of morphological knowledge in a form that machines can readily act on. This project aims to address this barrier by creating a centralized computational infrastructure that affords comparative analysis tools the ability to compute with morphological knowledge through scalable online application programming interfaces (APIs), enabling developers of comparative analysis tools, and therefore their users, to tap into machine reasoning-powered capabilities and data with machine-actionable semantics. By shifting all the heavy-lifting to this infrastructure, tools can programmatically obtain answers to knowledge-based questions that would otherwise require careful study by a human export, such as objectively and reproducibly assessing the relatedness, independence, and distinctness of characters and character states, with only a few lines of code. To accomplish this, the project will adapt key products and know-how developed by the Phenoscape project, including an integrative knowledgebase of ontology-linked phenotype data, metrics for quantifying the semantic similarity of phenotype descriptions, and algorithms for synthesizing morphological data from published trait descriptions. To drive development of the computational infrastructure and to demonstrate its enabling value, the project's objectives focus on addressing three concrete long-standing needs for which the difficulty of computing with domain knowledge is the major impediment: (1) computationally synthesizing, calibrating, and assessing morphological trait matrices from across studies; (2) objectively and reproducibly incorporating morphological domain knowledge provided by ontologies into evolutionary models of trait evolution; and (3) generating testable hypotheses for adaptive diversification by incorporating semantic phenotypes into ancestral state reconstruction and identifying domain ontology concepts linked to evolutionary changes in a branch or clade more frequently than expected by chance. In addition, to better prepare evolutionary biologist users and developers of comparative analysis tools for adopting these new capabilities, a domain-tailored short-course on requisite knowledge representation and computational inference technologies will be developed and taught. More information on this project can be found at http://cate.phenoscape.org/.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mabee, Paula M and Balhoff, James P and Dahdul, Wasila M and Lapp, Hilmar and Mungall, Christopher J and Vision, Todd J and Smith, Stephen "A Logical Model of Homology for Comparative Biology" Systematic Biology , v.69 , 2019 https://doi.org/10.1093/sysbio/syz067 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page