Award Abstract # 1461364
Collaborative Research: Novel Methodologies for Genome-scale Evolutionary Analysis of Multi-locus data

NSF Org: DBI
Division of Biological Infrastructure
Recipient: UNIVERSITY OF ILLINOIS
Initial Amendment Date: September 8, 2014
Latest Amendment Date: September 8, 2014
Award Number: 1461364
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: August 16, 2014
End Date: June 30, 2016 (Estimated)
Total Intended Award Amount: $246,332.00
Total Awarded Amount to Date: $246,332.00
Funds Obligated to Date: FY 2011 = $246,332.00
History of Investigator:
  • Tandy Warnow (Principal Investigator)
    warnow@illinois.edu
Recipient Sponsored Research Office: University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA
IL  US  61801-3620
(217)333-2187
Sponsor Congressional District: 13
Primary Place of Performance: University of Illinois at Urbana-Champaign
1901 South First St. Suite A
Champaign
IL  US  61820-7473
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): Y8CWNJRCNN91
Parent UEI: V2PHZ2CSCH63
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165, 9179
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Rice University, the University of Michigan, and University of Illinois at Urbana-Champaign are awarded collaborative grants to develop and implement algorithms and software tools for the analysis of gene genealogies and inference of species phylogenies from them. A gene genealogy, also known as gene tree, models how genes replicate and get transmitted from one generation to the next during evolution. A species phylogeny models how species arise and diverge. A species phylogeny is traditionally inferred by a three-step process: (1) a genomic region from the set of species under study is sequenced; (2) a "gene tree" is inferred for the genomic region; and, (3) the gene tree is declared to be the species tree. However, recent evolutionary genomic analyses of various groups of organisms have demonstrated that different genomic regions may have evolutionary histories that disagree with each other as well as with that of the species. Further, evolutionary processes such as horizontal gene transfer, result in network-like, rather than tree-like, species phylogenies. This joint project will develop accurate computational methods for determining the causes of gene tree discordance, and inferring species phylogenies (trees as well as networks) from gene trees despite their discordance. Special emphasis will be put on the efficiency of the methods so that they allow for analysis of genome-scale data sets. All methods will be implemented and extensively tested for performance.

All methods developed will be made publicly available in software packages that we have been developing in the respective groups. The material will be integrated into courses that the PIs regularly teach at their respective institutions. Last but not least, the project will culminate with a two-day workshop, open to students and post-doctoral fellows from around the country, with presentations by the investigators on the methodologies developed, as well as hands-on tutorials on using the tools in analyzing data.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 19)
E. Jarvis, S. Mirarab, et al. "Whole Genome Analyses Resolve the Early Branches in the Tree of Life of Modern Birds" Science. , 2014
J. Chou, A. Gupta, S. Yaduvanshi, R. Davidson, M. Nute, S. Mirarab and T. Warnow. "A comparative study of SVDquartets and other coalescent-based species tree estimation methods." BMC Genomics , v.16 , 2015 , p.S2 DOI: 10.1186/1471-2164-16-S10-S2
Md Shamsuzzoha Bayzid, Siavash Mirarab, Bastien Boussau, and Tandy Warnow "Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses" PLOS One , 2015 10.1371/journal.pone.0129183
N. Nguyen, S. Mirarab, B. Liu, M. Pop, and T. Warnow "TIPP:Taxonomic Identification and Phylogenetic Profiling." Bioinformatics , 2014 10.1093/bioinformatics/btu721
N. Nguyen, S. Mirarab, K. Kumar, and T. Warnow "Ultra-large alignments using phylogeny aware profiles" Genome Biology , v.16 , 2015 doi: 10.1186/s13059-015-0688-z
Pranjal Vachaspati and Tandy Warnow "ASTRID: Accurate Species TRees from Internode Distances" BMC Bioinformatics , v.16 , 2016 DOI: 10.1186/1471-2164-16-S10-S3
R. Davidson, P. Vachaspati, S. Mirarab, and T. Warnow. "R. Davidson, P. Vachaspati, S. Mirarab, and T. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer." BMC Genomics , v.16 , 2015 , p.S1 DOI: 10.1186/1471-2164-16-S10-S1
Siavash Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow. "Statistical binning enables an accurate coalescent-based estimation of the avian tree" Science , 2014 , p.1250463
S. Mirarab and T. Warnow. "ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes" Bioinformatics , v.31 , 2015 , p.i44 doi: 10.1093/bioinformatics/btv234
S. Mirarab, Md. S. Bayzid, and T. Warnow "Evaluating summary methods for multi-locus speciestree estimation in the presence of incomplete lineagesorting" Systematic Biology , 2014
S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow "Response to Comment on ?Statistical binning enables an accurate coalescent-based estimation of the avian tree?" Science , v.350 , 2016 , p.171 DOI: 10.1126/science.aaa7719
(Showing: 1 - 10 of 19)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Novel Methodologies for Genome-scale Analysis of Multilocus Data project was a joint effort among groups at Stanford University, the University of Illinois at Urbana-Champaign, Rice University, and Linfield College. The project aims to (1) devise new algorithms for species tree inference; (2) develop new methods for scalability of inference algorithms to large-scale genomic data; and (3) perform mathematical, simulation-based, and empirical evaluations of the properties of species tree inference algorithms. This NSF grant supported the efforts at the University of Illinois at Urbana-Champaign.

Intellectual Merit: 30 journal papers were supported by this grant to UIUC, and several software packages were developed and distributed under open-source licenses. Highlights of this effort include

  • ASTRAL, a polynomial time method for computing a tree from a set of gene trees, and that is statistically consistent under the multi-species coalescent model. ASTRAL is very fast, and can analyze datasets with up to 1,000 genes and 1,000 species in under a day. ASTRAL was used in the Thousand Plant Transcriptome Project (1KP) to compute a plant phylogeny in Wickett, Mirarab, et al., Proceedings of the National Academy of Sciences (USA) 2014. 
  • ASTRID, a coalescent-based method for estimating species trees from multiple gene trees, and which is faster than ASTRAL and often as accurate. ASTRID is statistically consistent under the multi-species coalescent model, and runs in polynomial time.
  • Statistical binning, and its improvement (weighted statistical binning), which are techniques to provide better estimates of gene trees in a multi-locus phylogenomic project.  Statistical binning was published in Mirarab et al. Science 2014, and  used to provide a coalescent-based estimation of the Avian phylogeny in Jarvis, Mirarab, et al. Science 2014. The weighted statistical binning technique (Bayzid et al., PLOS One 2016) enables statistically consistent coalescent-based species tree estimation pipelines, when followed by methods for combining gene trees that are statistically consistent under the multi-species coalescent model. 

Broader Impacts: The grant supported annual (no-fee) Symposia that presented research results from the grant, and Software Schools that provided training in the use of software supported by this grant. Each year, about 100 students, postdocs, and faculty attended the annual Symposia and Software Schools, and the grant provided travel awards to many of the attendees. The grant also supported the development of open-source software for the methods developed by the project, many of which are available through Github.

Human resource development: Siavash Mirarab received a PhD  for work on this project (statistical binning, ASTRAL, and other methods) supported by this grant, and is now an Assistant Professor at UCSD; his PhD dissertation was awarded Honorable Mention by the Association for Computing Machinery (ACM) for best dissertation in 2015. 

 

 

 


Last Modified: 07/18/2016
Modified by: Tandy Warnow

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page