
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | September 8, 2014 |
Latest Amendment Date: | September 8, 2014 |
Award Number: | 1461364 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter McCartney
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | August 16, 2014 |
End Date: | June 30, 2016 (Estimated) |
Total Intended Award Amount: | $246,332.00 |
Total Awarded Amount to Date: | $246,332.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
506 S WRIGHT ST URBANA IL US 61801-3620 (217)333-2187 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1901 South First St. Suite A Champaign IL US 61820-7473 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
Rice University, the University of Michigan, and University of Illinois at Urbana-Champaign are awarded collaborative grants to develop and implement algorithms and software tools for the analysis of gene genealogies and inference of species phylogenies from them. A gene genealogy, also known as gene tree, models how genes replicate and get transmitted from one generation to the next during evolution. A species phylogeny models how species arise and diverge. A species phylogeny is traditionally inferred by a three-step process: (1) a genomic region from the set of species under study is sequenced; (2) a "gene tree" is inferred for the genomic region; and, (3) the gene tree is declared to be the species tree. However, recent evolutionary genomic analyses of various groups of organisms have demonstrated that different genomic regions may have evolutionary histories that disagree with each other as well as with that of the species. Further, evolutionary processes such as horizontal gene transfer, result in network-like, rather than tree-like, species phylogenies. This joint project will develop accurate computational methods for determining the causes of gene tree discordance, and inferring species phylogenies (trees as well as networks) from gene trees despite their discordance. Special emphasis will be put on the efficiency of the methods so that they allow for analysis of genome-scale data sets. All methods will be implemented and extensively tested for performance.
All methods developed will be made publicly available in software packages that we have been developing in the respective groups. The material will be integrated into courses that the PIs regularly teach at their respective institutions. Last but not least, the project will culminate with a two-day workshop, open to students and post-doctoral fellows from around the country, with presentations by the investigators on the methodologies developed, as well as hands-on tutorials on using the tools in analyzing data.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Novel Methodologies for Genome-scale Analysis of Multilocus Data project was a joint effort among groups at Stanford University, the University of Illinois at Urbana-Champaign, Rice University, and Linfield College. The project aims to (1) devise new algorithms for species tree inference; (2) develop new methods for scalability of inference algorithms to large-scale genomic data; and (3) perform mathematical, simulation-based, and empirical evaluations of the properties of species tree inference algorithms. This NSF grant supported the efforts at the University of Illinois at Urbana-Champaign.
Intellectual Merit: 30 journal papers were supported by this grant to UIUC, and several software packages were developed and distributed under open-source licenses. Highlights of this effort include
- ASTRAL, a polynomial time method for computing a tree from a set of gene trees, and that is statistically consistent under the multi-species coalescent model. ASTRAL is very fast, and can analyze datasets with up to 1,000 genes and 1,000 species in under a day. ASTRAL was used in the Thousand Plant Transcriptome Project (1KP) to compute a plant phylogeny in Wickett, Mirarab, et al., Proceedings of the National Academy of Sciences (USA) 2014.
- ASTRID, a coalescent-based method for estimating species trees from multiple gene trees, and which is faster than ASTRAL and often as accurate. ASTRID is statistically consistent under the multi-species coalescent model, and runs in polynomial time.
- Statistical binning, and its improvement (weighted statistical binning), which are techniques to provide better estimates of gene trees in a multi-locus phylogenomic project. Statistical binning was published in Mirarab et al. Science 2014, and used to provide a coalescent-based estimation of the Avian phylogeny in Jarvis, Mirarab, et al. Science 2014. The weighted statistical binning technique (Bayzid et al., PLOS One 2016) enables statistically consistent coalescent-based species tree estimation pipelines, when followed by methods for combining gene trees that are statistically consistent under the multi-species coalescent model.
Broader Impacts: The grant supported annual (no-fee) Symposia that presented research results from the grant, and Software Schools that provided training in the use of software supported by this grant. Each year, about 100 students, postdocs, and faculty attended the annual Symposia and Software Schools, and the grant provided travel awards to many of the attendees. The grant also supported the development of open-source software for the methods developed by the project, many of which are available through Github.
Human resource development: Siavash Mirarab received a PhD for work on this project (statistical binning, ASTRAL, and other methods) supported by this grant, and is now an Assistant Professor at UCSD; his PhD dissertation was awarded Honorable Mention by the Association for Computing Machinery (ACM) for best dissertation in 2015.
Last Modified: 07/18/2016
Modified by: Tandy Warnow
Please report errors in award information by writing to: awardsearch@nsf.gov.