
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | July 24, 2012 |
Latest Amendment Date: | July 24, 2012 |
Award Number: | 1248176 |
Award Instrument: | Standard Grant |
Program Manager: |
Tomek Bartoszynski
tbartosz@nsf.gov (703)292-4885 DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | May 14, 2012 |
End Date: | July 31, 2014 (Estimated) |
Total Intended Award Amount: | $91,476.00 |
Total Awarded Amount to Date: | $91,476.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
21 N PARK ST STE 6301 MADISON WI US 53715-1218 (608)262-3822 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
21 North Park Street Madison WI US 53715-1218 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
PROBABILITY, MATHEMATICAL BIOLOGY, COFFES |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
It is proposed to investigate the phylogenetic reconstruction problem from a probabilistic perspective. Inferring the speciation history of a group of organisms is a fundamental problem in evolutionary biology. This history is represented by a phylogeny, i.e., a rooted tree where the leaves correspond to current species and branchings indicate speciation events. The stochastic evolution of molecular sequences on such a phylogeny is an instance of a Markov model on a tree. In this project, the PI will further develop connections between the theory of Gibbs measures on trees and the phylogenetic reconstruction problem. A particular emphasis will be given to models of insertions and deletions with the objective of providing a probabilistic analysis of the multiple sequence alignment problem. Connections to information theory problems will also be considered.
Assembling the Tree of Life is a fundamental problem in biology which provides insights in the study of evolution, adaptation, and speciation. Much information about past evolutionary events can be inferred from the analysis of DNA sequence data collected from existing species. A notable feature of the evolution of molecular sequences is the significant role played by randomness. In recent years, probability theory, the mathematical study of randomness, has provided key new insights in assessing the power of statistical methods to reconstruct evolutionary processes in large-scale phylogenetics. The main theme of this project is to further investigate these connections. In particular, more realistic models of evolution will be considered. New phylogenetic analysis techniques will be developed and implemented.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Tree of Life is a standard graphical representation of the evolutionary history of organisms. It plays an important role in many analyses in biology. The Tree of Life is reconstructed using DNA sequences collected from extant species. The basic idea is simple: the further in the tree two species are, the more dissimilar their sequences should be. In order to solve this reconstruction problem accurately and efficiently on large datasets, sophisticated mathematical and computational techniques have been used in recent years. Tools from probability theory have been particularly helpful in that respect because of the key role played by chance in evolutionary processes. The overall goal of this grant was to further develop this fruitful interplay between probability theory and mathematical phylogenetics through several projects extending previous rigorous tools to more realistic biological models of sequence evolution.
Evolution acts on genomes through mutations which are inherited from ancestors. An important class of mutations is known as indels, whereby a nucleotide is inserted or deleted in a DNA sequence. Indels lead to significant challenges in comparing DNA data from multiple species as they cause nucleotides to be “misaligned.” The process of correcting for these nucleotide shifts is called the multiple sequence alignment problem--a notoriously difficult computational problem. Multiple sequence alignment is typically the first step in tree reconstruction algorithms. One major outcome of the current grant is a mathematical proof that this difficult problem can in fact be avoided. A new efficient method for reconstructing trees was developed that works directly on unaligned sequences and it was shown rigorously that almost optimal data requirement can be achieved using it. In related projects, techniques from probability theory were also employed to study the multiple sequence alignment problem in a proper evolutionary context.
New sequencing technologies have greatly facilitated the production of larger and larger datasets. In that context, representing the evolutionary history of a group of organisms with a single tree is often a simplification. Because different genes have different histories, a more appropriate model is a collection of trees, one for each gene. This approach is relevant for instance if some genes have undergone lateral gene transfers, whereby a gene is inherited from a distant species rather than from a direct ancestor. In that case, the resulting collection of trees can be modeled mathematically as an ensemble, each gene tree being a copy of an underlying species tree subject to a form of combinatorial noise. An outcome of this project has been a mathematical proof that the species tree can be recovered from the noisy gene trees even in the presence of high rates of transfers, which is common for example in some bacterial species. Several other types of gene tree ensembles were also studied successfully.
The reconstruction of the Tree of Life is a fundamental problem in biology which provides insights into the study of evolution, adaptation, and speciation. Through the development and broad dissemination of new algorithms for phylogenetic analysis as well as the training of graduate students, the research activities supported by this grant have helped advance the state of knowledge in evolutionary biology and may in the future contribute to the numerous benefits to society of phylogenetic research such as--just to name a few--the discovery of new life forms for biotechnology, the protection of ecosystems from invasive species, the identification of emerging diseases, or the prediction of disease outbreaks.
Last Modified: 10/29/2014
Modified by: Sebastien Roch
Please report errors in award information by writing to: awardsearch@nsf.gov.