Award Abstract # 1458572
Collaborative Research: ABI Development: An open infrastructure to disseminate phylogenetic knowledge

NSF Org: DBI
Division of Biological Infrastructure
Recipient: UNIVERSITY OF MARYLAND, COLLEGE PARK
Initial Amendment Date: April 2, 2015
Latest Amendment Date: March 9, 2020
Award Number: 1458572
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: July 1, 2015
End Date: March 31, 2020 (Estimated)
Total Intended Award Amount: $447,696.00
Total Awarded Amount to Date: $447,696.00
Funds Obligated to Date: FY 2015 = $447,696.00
History of Investigator:
  • Arlin Stoltzfus (Principal Investigator)
    arlin@umd.edu
Recipient Sponsored Research Office: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
(301)405-6269
Sponsor Congressional District: 04
Primary Place of Performance: Institute for Bioscience and Biotechnology Research
9600 Gudelsky Drive
Rockville
MD  US  20850-3467
Primary Place of Performance
Congressional District:
08
Unique Entity Identifier (UEI): NPU8ULVAAS23
Parent UEI: NPU8ULVAAS23
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Because phylogenies (trees) showing the evolutionary relationships of species are so useful in bioscience and biotechnology, there has been a major worldwide effort to determine trees for various groups of organisms. This information can be knitted together into a single "Tree of Life" (ToL) covering millions of species, though it is often better to think of expert ToL knowledge as a forest of overlapping source-trees. Because this body of knowledge is complex, rapidly changing, and distributed among many online resources, getting the latest knowledge is a challenge. The focus of this project is to get ToL knowledge into the hands of scientists, educators, and the public, by building a distributed system of internet services that work together with existing NSF-sponsored projects to deliver custom trees to users as quickly and easily as they currently get online driving directions. The resulting system will allow a greater range of life-sciences researchers to ask more complex and challenging questions relating to diversity and biological functions. The project will work directly with educators and with educational resources such as the Encyclopedia of Life, making it easy for millions of users to access the latest scientific knowledge of species relationships.

Phylogenetic trees are useful in all areas of biology, both to organize knowledge by guiding classification and for process-based models that allow scientists to make robust inferences from comparisons of evolved entities (genes, species, etc). Phylogenetic knowledge is disseminated today via a very large number of idiosyncratic pathways, most of which are not easily traceable. While experts continue expanding the Tree of Life (ToL) knowledge, addressing gaps and conflicts, our focus is on dissemination, putting ToL knowledge in the hands of researchers, educators, and the public. The goal of this project is to design and develop an open web-service architecture for ToL delivery, with the functionality necessary to integrate into scientific workflows, including name-resolution, tree discovery, subtree extraction, and scaling of trees. The architecture will be designed as a sustainable distributed collection of services and will rely on semantically rich descriptions to facilitate composition of services and to documents the tree discovery and reuse process. The project will cultivate this system as a community resource by (1) involving partners, domain experts, and a broader phyloinformatics community in the design process; (2) partnering with other projects to involve them as service-providers or consumers; (3) developing innovative clients demonstrating quantitatively important use-cases; (4) staging a hackathon for participants to add services or develop clients. Results of the project will be accessible via www.phylotastic.org.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Van D Nguyen, Thanh H Nguyen, Abu Saleh Md Tayeen, H Dail Laughinghouse, IV, Luna L Sánchez-Reyes, Jodie Wiggins, Enrico Pontelli, Dmitry Mozzherin, Brian OMeara, Arlin Stoltzfus "Phylotastic: Improving Access to Tree-of-Life Knowledge With Flexible, on-the-Fly Delivery of Trees" Evolutionary Bioinformatics , v.16 , 2020 , p.1 https://doi.org/10.1177/1176934319899384

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A comprehensive phylogeny of species, i.e., a tree of life covering everything from whales and giant sequoias to invisible microbes, has many uses in research, education and public policy.  For a quarter century, funding agencies have supported scientific projects devoted to "assembling the tree of life."  Yet, accessing the resulting phylogenetic knowledge has required special knowledge, complex software, or long periods of training.  

The Phylotastic project aimed to narrow this accessibility gap by making the task of getting a phylogeny of species as easy as getting driving directions from an online mapping tool.  The Phylotastic web portal (https://portal.phylotastic.org) is a web application accessed through any browser without the need for a login.  The portal obtains and displays a tree using one of three main workflows illustrated in the attached figure: (1) extract names from an electronic resource (e.g., document file, web page) that may contain text other than names; (2) use a user-supplied list of names; and (3) sample species from a user-specified taxon.

Each of these workflows can be executed interactively in a minute or two, yielding a phylogeny scaled according to geologic time, and decorated with thumbnail images of species, illustrated in the attached image of a phylogeny of aquatic mammals. 

Underlying this functionality is a coordinated system based on modular web services to support on-the-fly delivery of phylogenetic knowledge.  Specifically, more than 30  web services developed by the project are accessible via a common web-services registry (https://registry.phylotastic.org).  Software toolkits (in R and Python) were developed so that others can develop software for delivering phylogeny-related data and services.  

In 2019, the project sponsored a workshop for educators teaching at the middle-school, high-school, and college levels (https://www.esciencetools.org/).  Working with scientific experts, the participants ultimately created 12 lesson plans that using online science tools to discover knowledge about phylogeny and biodiversity (https://jwiggi18.github.io/phyloEd/). 

With support from the Phylotastic projectd, the Global Names project made major improvements in computer tools to find species names in scientific texts. The benchmark challenge of indexing the Biodiversity Heritage Library, with its 50 million pages of content, has been reduced from 45 days to several hours.  Given that the BHL covers roughly 10 % of the biodiversity literature, this means that the indexing of *all* of the biodiversity literature is now technically feasible.


Last Modified: 07/23/2020
Modified by: Arlin Stoltzfus

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page