Award Abstract # 2308495
Principled phylogenomic analysis without gene tree estimation

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: UNIVERSITY OF WISCONSIN SYSTEM
Initial Amendment Date: July 19, 2023
Latest Amendment Date: July 19, 2023
Award Number: 2308495
Award Instrument: Standard Grant
Program Manager: Zhilan Feng
zfeng@nsf.gov
 (703)292-7523
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: August 1, 2023
End Date: July 31, 2026 (Estimated)
Total Intended Award Amount: $295,263.00
Total Awarded Amount to Date: $295,263.00
Funds Obligated to Date: FY 2023 = $295,263.00
History of Investigator:
  • Sebastien Roch (Principal Investigator)
    roch@math.wisc.edu
Recipient Sponsored Research Office: University of Wisconsin-Madison
21 N PARK ST STE 6301
MADISON
WI  US  53715-1218
(608)262-3822
Sponsor Congressional District: 02
Primary Place of Performance: University of Wisconsin-Madison
21 N PARK ST STE 6301
MADISON
WI  US  53715-1218
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): LCLSJAGTNZQ7
Parent UEI:
NSF Program(s): OFFICE OF MULTIDISCIPLINARY AC,
STATISTICS,
MATHEMATICAL BIOLOGY
Primary Program Source: 01002324DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 068Z
Program Element Code(s): 125300, 126900, 733400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

This project aims to improve the estimation of species trees from genomic datasets. This estimation is challenging because different genomic regions evolve under processes that make their evolutionary histories (i.e., gene trees) discordant. This issue is exacerbated by widespread gene tree estimation errors in modern phylogenomic analyses. To address this challenge, this project's primary objective is to devise innovative mathematical, statistical, and computational techniques to analyze phylogenomic datasets without relying on gene tree estimation. This approach will produce more reliable species tree estimates in the presence of confounding processes. Species trees provide an evolutionary and comparative context in which many biological questions can be addressed. They play a vital role in understanding gene evolution, estimating divergence dates, detecting adaptation, studying trait evolution, etc. The developed methods will enhance the precision of biological discoveries based on species trees, advancing research that utilizes phylogenies. The project includes interdisciplinary research training for graduate students as well as the involvement of undergraduate students recruited through local initiatives. New course materials based on the proposed research will be developed for existing graduate courses and be made available through the PI?s website. The project will leverage connections to NSF-funded interdisciplinary institutes.

It is well established that different regions of a genome can evolve under different gene trees, due to processes such as incomplete lineage sorting, gene duplication and loss, and lateral gene transfer, complicating the estimation of species trees. Many methods that first estimate gene trees and then combine this information to estimate a species tree are known to have good theoretical guarantees, under the assumption that the true gene trees are known. That assumption is not satisfied in practice. Accounting theoretically for gene tree estimation error has proved challenging and few results are available. Building on prior work by the PI on the rigorous study of stochastic processes arising in this phylogenomic context, the proposed research will establish much-needed theoretical foundations for the analysis of multi-locus, multi-site datasets and the estimation of species trees without gene trees, including the development of novel estimators, the derivation of impossibility results and matching finite sample bounds, and the investigation of the effect of intra-locus recombination. This project will also enable the development of statistically rigorous, scalable algorithms. This interdisciplinary research will involve a close integration of applied probability, statistical theory, graph algorithms, and evolutionary biology.

This proposal is jointly funded by the Mathematical Biology and Statistics Programs at the Division of Mathematical Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hill, Max and Roch, Sebastien and Rodriguez, Jose Israel "Maximum Likelihood Estimation for Unrooted 3-Leaf Trees: An Analytic Solution for the CFN Model" Bulletin of Mathematical Biology , v.86 , 2024 https://doi.org/10.1007/s11538-024-01340-x Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page