
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | January 29, 2016 |
Latest Amendment Date: | January 29, 2019 |
Award Number: | 1553421 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | February 1, 2016 |
End Date: | January 31, 2023 (Estimated) |
Total Intended Award Amount: | $499,576.00 |
Total Awarded Amount to Date: | $499,576.00 |
Funds Obligated to Date: |
FY 2017 = $93,774.00 FY 2018 = $101,102.00 FY 2019 = $106,312.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
438 WHITNEY RD EXTENSION UNIT 1133 STORRS CT US 06269-9018 (860)486-3622 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
371 Fairfield Way Storrs CT US 06269-4155 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The genome of an organism helps to determine its biology. Understanding how different genes evolve and acquire new functions is a fundamental biological problem with many computational methods developed for studying how gene families evolve and change over time in different organisms. These existing methods assume that the gene is the basic unit of evolution and that evolutionary processes such as gene duplication, gene loss, and horizontal gene transfer act on entire genes, rather than on parts of genes. It is well known that most genes consist of one or more "protein domains," well-characterized functional units that can be independently lost or gained during evolution, and that domain shuffling is one of the primary mechanisms through which genes evolve and gain new functions. Proper inference and accounting of domain-level evolutionary events is therefore crucial to understanding how genes evolve and function. The proposed research will lay the methodological and algorithmic foundations for a novel computational framework that addresses this critical problem. The new computational framework and algorithms will enable more powerful comparative genomic techniques for understanding gene function and biology, and may also contribute to improvements in human health and agriculture. The proposed research will shape future computational advances in the study of domain, gene, and genome evolution for many years to come, and will also spur the development of more comprehensive computational models in other areas of molecular evolution. The algorithms developed as part of this research will be implemented into a user-friendly software package and made freely available. The project will directly involve two graduate and up to ten undergraduate students, introduce several high-school students to computer science, bioinformatics, and research, and provide training to many high-school science teachers on the role of computer science in biology.
This project will lead to the development of the first "three-tree" model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. The proposed three-tree computational framework is based on phylogenetic reconciliation, where the goal is to find a most parsimonious joint reconciliation of the given gene trees with the species tree and of the given domain trees with the gene trees. The resulting optimization problems will be solved using various algorithmic techniques including dynamic programming, branch and bound, enumeration and sampling, and local search. The framework will decouple domain-level events from gene-level events and provide a fine-grained view of gene family and domain family evolution that is both more accurate and much easier to interpret. Specific aims include: (i) development of the three-tree computational framework and corresponding algorithms, (ii) enhancing inference accuracy by accounting for multiple optima and domain tree errors, and (iii) extension of the three-tree framework to microbial gene families by allowing for horizontal gene transfer.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The overarching goal of this project was to develop novel computational techniques and algorithms for integrating domain-level, gene-level, and species-level evolution, with the aim of transforming biologists' ability to systematically study the evolution of gene families and protein domains. The project specifically focused on the development of a novel, three-tree phylogenetic reconciliation framework, called the Domain-Gene-Species (DGS) reconciliation framework, and on laying its methodological and algorithmic foundations.
To achieve the specific research goals of this project, we developed many new algorithms and computational methods for analyzing gene and subgene/domain evolution, released several new open-source software packages, and applied the new methods to large-scale simulated and real datasets to demonstrate their impact in practice. Some of the significant methodological and algorithmic innovations enabled by this project include (i) development of the first integrated DGS reconciliation framework that explicitly models the interdependence of domain-, gene-, and species-level evolution, (ii) development of effective exact and heuristic algorithms for computing optimal DGS reconciliations, (iii) extensions of the DGS reconciliation framework to account for multiple domains and to allow for horizontal transfer of genes and domains, along with development of associated algorithms, (iv) development of novel computational techniques for handling gene tree error and uncertainty, (v) development of the first phylogenetic simulation framework for simulating subgene/domain level evolution within gene families, (vi) development of a novel computational approach for identifying gene families affected by domain-level or partial gene transfer, and (vii) development of new computational techniques for improving gene family sequence alignment and gene tree accuracy by accounting for domain gain/loss and rearrangement.
The research supported by this project has resulted in 15 peer-reviewed journal and conference publications, one book chapter, and one manuscript that is currently under review for journal publication. Two of the conference publications received best paper awards. In addition, 2 manuscripts are currently in preparation for journal submission. The project also enabled the development of 9 freely available open-source software packages: RANGER-DTL v2.0 for phylogenetic reconciliation, SEADOG, SEADOG-ILP, SEADOG-MD, and SEADOG-Gen for computing Domain-Gene-Species reconciliations, SaGePhy for probabilistic phylogenetic simulation of gene and subgene evolution, RF+ for phylogenetic tree comparison, trippd for identification of gene families affected by partial gene transfer, and virDTL for viral recombination inference. All software packages are freely available from the PI’s lab website: https://compbio.engr.uconn.edu/software/
Many of the methods and software tools mentioned above are the first of their kind and make it possible to study those aspects of domain and gene family evolution that were either difficult or impossible to study previously. The open-source and easy-to-use software packages developed through this project will allow biologists and other researchers to easily apply the new methods to their own datasets.
This project provided invaluable research experience to 5 PhD students and 7 undergraduate students. Among the 5 PhD students, 3 graduated with their PhDs during this project, with one starting a postdoctoral research position, one returning to their home country to work as a senior algorithm engineer, and one joining the US technology industry. Among the 7 undergraduate students (3 female), 3 are currently pursuing their PhDs in computer science at Stanford, UCLA, and Duke, and a fourth will be joining the PhD program in computer science at the University of Maryland in Fall 2023. The project also enabled the training of 24 middle- and high-school science teachers (of which 19 were female and 6 were African American) from across the US as part of the intensive week-long workshop titled "Bioinformatics: Using computer science to understand life" taught by the PI in the summers of 2016, 2017, 2018, 2019, 2021, and 2022. Participating teachers were trained in the basic principles and techniques of bioinformatics and were provided with simple bioinformatics exercises for use in their own classrooms. Finally, this project also provided research experiences to several Connecticut high-school students.
Last Modified: 05/25/2023
Modified by: Mukul Bansal
Please report errors in award information by writing to: awardsearch@nsf.gov.