Award Abstract # 1248176
Probabilistic Techniques in Mathematical Phylogenetics

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: UNIVERSITY OF WISCONSIN SYSTEM
Initial Amendment Date: July 24, 2012
Latest Amendment Date: July 24, 2012
Award Number: 1248176
Award Instrument: Standard Grant
Program Manager: Tomek Bartoszynski
tbartosz@nsf.gov
 (703)292-4885
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: May 14, 2012
End Date: July 31, 2014 (Estimated)
Total Intended Award Amount: $91,476.00
Total Awarded Amount to Date: $91,476.00
Funds Obligated to Date: FY 2010 = $91,475.00
History of Investigator:
  • Sebastien Roch (Principal Investigator)
    roch@math.wisc.edu
Recipient Sponsored Research Office: University of Wisconsin-Madison
21 N PARK ST STE 6301
MADISON
WI  US  53715-1218
(608)262-3822
Sponsor Congressional District: 02
Primary Place of Performance: University of Wisconsin-Madison
21 North Park Street
Madison
WI  US  53715-1218
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): LCLSJAGTNZQ7
Parent UEI:
NSF Program(s): PROBABILITY,
MATHEMATICAL BIOLOGY,
COFFES
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126300, 733400, 755200
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

It is proposed to investigate the phylogenetic reconstruction problem from a probabilistic perspective. Inferring the speciation history of a group of organisms is a fundamental problem in evolutionary biology. This history is represented by a phylogeny, i.e., a rooted tree where the leaves correspond to current species and branchings indicate speciation events. The stochastic evolution of molecular sequences on such a phylogeny is an instance of a Markov model on a tree. In this project, the PI will further develop connections between the theory of Gibbs measures on trees and the phylogenetic reconstruction problem. A particular emphasis will be given to models of insertions and deletions with the objective of providing a probabilistic analysis of the multiple sequence alignment problem. Connections to information theory problems will also be considered.

Assembling the Tree of Life is a fundamental problem in biology which provides insights in the study of evolution, adaptation, and speciation. Much information about past evolutionary events can be inferred from the analysis of DNA sequence data collected from existing species. A notable feature of the evolution of molecular sequences is the significant role played by randomness. In recent years, probability theory, the mathematical study of randomness, has provided key new insights in assessing the power of statistical methods to reconstruct evolutionary processes in large-scale phylogenetics. The main theme of this project is to further investigate these connections. In particular, more realistic models of evolution will be considered. New phylogenetic analysis techniques will be developed and implemented.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Alexandr Andoni and Constantinos Daskalakis and Avinatan Hassidim and Sebastien Roch "Global alignment of molecular sequences via ancestral state reconstruction" Stochastic Processes and their Applications , v.122 , 2012 , p.3852 10.1016/j.spa.2012.08.004
Alexandr Andoni and Constantinos Daskalakis and Avinatan Hassidim and Sebastien Roch "Global alignment of molecular sequences via ancestral state reconstruction" Stochastic Processes and their Applications , v.122 , 2012 , p.3852 - 38 10.1016/j.spa.2012.08.004
Daskalakis, Constantinos and Roch, Sebastien "Alignment-free phylogenetic reconstruction: sample complexity via a branching process analysis" Annals of Applied Probability , v.23 , 2013 , p.693 10.1214/12-AAP852
Daskalakis, Constantinos and Roch, Sebastien "Alignment-free phylogenetic reconstruction: sample complexity via a branching process analysis" Ann. Appl. Probab. , v.23 , 2013 , p.693--721 10.1214/12-AAP852
Mossel, E. and Roch, S. and Sly, A. "Robust Estimation of Latent Tree Graphical Models: Inferring Hidden States With Inexact Parameters" Information Theory, IEEE Transactions on , v.59 , 2013 , p.4357 10.1109/TIT.2013.2251927
Mossel, E. and Roch, S. and Sly, A. "Robust Estimation of Latent Tree Graphical Models: Inferring Hidden States With Inexact Parameters" Information Theory, IEEE Transactions on , v.59 , 2013 , p.4357-4373 10.1109/TIT.2013.2251927
Mossel, Elchanan and Roch, Sebastien "Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies" Journal of Mathematical Biology , v.67 , 2013 , p.767 10.1007/s00285-012-0571-4
Mossel, Elchanan and Roch, Sebastien "Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies" Journal of Mathematical Biology , v.67 , 2013 , p.767-797 10.1007/s00285-012-0571-4
Mossel, Elchanan and Roch, Sebastien "Phylogenetic mixtures: concentration of measure in the large-tree limit" Annals of Applied Probability , v.22 , 2012 , p.2429 10.1214/11-AAP837
Mossel, Elchanan and Roch, Sebastien "Phylogenetic mixtures: concentration of measure in the large-tree limit" Ann. Appl. Probab. , v.22 , 2012 , p.2429--245 10.1214/11-AAP837
Mossel, Elchanan and Roch, Sebastien and Sly, Allan "On the Inference of Large Phylogenies with Long Branches: How Long Is Too Long?" Bulletin of Mathematical Biology , v.73 , 2011 , p.1627-1644 0092-8240
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Tree of Life is a standard graphical representation of the evolutionary history of organisms. It plays an important role in many analyses in biology.  The Tree of Life is reconstructed using DNA sequences collected from extant species. The basic idea is simple: the further in the tree two species are, the more dissimilar their sequences should be. In order to solve this reconstruction problem accurately and efficiently on large datasets, sophisticated mathematical and computational techniques have been used in recent years. Tools from probability theory have been particularly helpful in that respect because of the key role played by chance in evolutionary processes. The overall goal of this grant was to further develop this fruitful interplay between probability theory and mathematical phylogenetics through several projects extending previous rigorous tools to more realistic biological models of sequence evolution. 


Evolution acts on genomes through mutations which are inherited from ancestors. An important class of mutations is known as indels, whereby a nucleotide is inserted or deleted in a DNA sequence. Indels lead to significant challenges in comparing DNA data from multiple species as they cause nucleotides to be “misaligned.” The process of correcting for these nucleotide shifts is called the multiple sequence alignment problem--a notoriously difficult computational problem. Multiple sequence alignment is typically the first step in tree reconstruction algorithms. One major outcome of the current grant is a mathematical proof that this difficult problem can in fact be avoided. A new efficient method for reconstructing trees was developed that works directly on unaligned sequences and it was shown rigorously that almost optimal data requirement can be achieved using it. In related projects, techniques from probability theory were also employed to study the multiple sequence alignment problem in a proper evolutionary context.


New sequencing technologies have greatly facilitated the production of larger and larger datasets. In that context, representing the evolutionary history of a group of organisms with a single tree is often a simplification. Because different genes have different histories, a more appropriate model is a collection of trees, one for each gene. This approach is relevant for instance if some genes have undergone lateral gene transfers, whereby a gene is inherited from a distant species rather than from a direct ancestor. In that case, the resulting collection of trees can be modeled mathematically as an ensemble, each gene tree being a copy of an underlying species tree subject to a form of combinatorial noise. An outcome of this project has been a mathematical proof that the species tree can be recovered from the noisy gene trees even in the presence of high rates of transfers, which is common for example in some bacterial species. Several other types of gene tree ensembles were also studied successfully.


The reconstruction of the Tree of Life is a fundamental problem in biology which provides insights into the study of evolution, adaptation, and speciation. Through the development and broad dissemination of new algorithms for phylogenetic analysis as well as the training of graduate students, the research activities supported by this grant have helped advance the state of knowledge in evolutionary biology and may in the future contribute to the numerous benefits to society of phylogenetic research such as--just to name a few--the discovery of new life forms for biotechnology, the protection of ecosystems from invasive species, the identification of emerging diseases, or the prediction of disease outbreaks.


Last Modified: 10/29/2014
Modified by: Sebastien Roch

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page