
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 25, 2010 |
Latest Amendment Date: | August 25, 2010 |
Award Number: | 1016995 |
Award Instrument: | Standard Grant |
Program Manager: |
Mitra Basu
mbasu@nsf.gov (703)292-8649 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2010 |
End Date: | August 31, 2014 (Estimated) |
Total Intended Award Amount: | $449,998.00 |
Total Awarded Amount to Date: | $449,998.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
4400 UNIVERSITY DR FAIRFAX VA US 22030-4422 (703)993-2295 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4400 UNIVERSITY DR FAIRFAX VA US 22030-4422 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Computational Biology |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The research involves the design and analysis of a framework to compute the spatial arrangements, also known as conformations, in which a protein chain of amino acids is biologically-active (in its native state). This is an important goal towards understanding protein function. While proteins are central to many biochemical processes, little is known about millions of protein sequences obtained from organismal genomes.
Intellectual Merit: The intellectual merit of this work lies in the development of a novel computational framework that combines probabilistic exploration with the theory of statistical mechanics to efficiently enhance the sampling of the conformational space near the native state. Low-dimensional projections guide the exploration towards low-energy and geometrically-diverse conformations. Additional intellectual merit lies in the incorporation of knowledge and observations emerging from biophysical theory and experiment, such as the use of coarse graining, relation between energy barrier height and temperature, and hierarchical organization of tertiary structure. Algorithmic components of the framework will be systematically evaluated for efficiency, accuracy, and how they enhance the sampling of the conformational space near the native state.
Broader Impact: The broader impact of this research will be the creation of a filter that efficiently computes diverse coarse-grained conformations relevant for the protein native state that can then be further refined through detailed biophysical studies. The work lies at the interface between computer science and protein biophysics and can benefit both communities. On the computational side, the work will lead to new algorithms on modeling articulated chains characterized by continuous high-dimensional search spaces and complex energy surfaces. On the biophysical side, the framework will elucidate which aspects of our understanding of proteins allow efficient and accurate modeling. The work will impact both undergraduate and graduate students. New courses are proposed by the investigator as part of efforts to introduce computational biology in the computer science curriculum at George Mason University. The work will be employed as a pedagogic device in courses and educational outreach venues to spawn and maintain interest in computer science, with a particular focus on women and minorities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Protein molecules are central to our biology. They achieve their biological function by adopting specific three-dimensional shapes or structures with which they are then able to interact with other molecules in the cell. A central goal of genomic sequencing efforts has been to decode sequences of proteins operating in a species' cells. Unfortunately, the ability to understand biological functions of novel proteins is tightly related to our ability to resolve the three-dimensional, biologically-active structures adopted by the decoded sequences of these proteins. Therefore, extracting structural information about a protein from its amino-acid sequence is a central problem in molecular biology. The problem is known as de novo protein structure prediction, with de novo indicating the fact that the process has to be performed from first principles in the absence of other structures of proteins of similar sequence to the target sequence. Computational approaches offer to complement wet-laboratory efforts in this endeavor, promising both speed and accuracy.
This project has advanced algorithmic research to address this problem. Its main contribution has been on designing more powerful algorithms capable of efficiently navigating the vast space of alternative spatial arrangements of a protein's chain of amino acids in search of those arrangements representing the biologically-active structure. The space of such arrangements, also known as conformations, is vast and high-dimensional, as a protein chain is highly deformable and composed of hundreds of amino acids or more. The algorithmic framework put forth by this project essentially conducts a guided navigation of the conformational space, exploiting knowledge that conformations adopted to perform a biological function position atoms so as to maximize favorable interactions and minimize unfavorable ones. Summing these interactions in an energy function allows associating an energy score with a conformation. Thus, de novo structure prediction can be viewed as an optimization problem. Given that the space of conformations is vast and high-dimensional, and the underlying energy surface associated with this space is rich in local minima, baseline optimization algorithms typically underperform. Most suffer from premature convergence to local minima which do not represent the biologically-active structure.
The algorithmic framework that has resulted from this project is able to avoid premature convergence. The framework navigates the conformational space by growing a tree in it. Vertices of the tree are conformations, and the tree grows in iterations. At every iteration, a decision is made on which vertex to expand so as to grow an additional branch terminating with a new vertex added to the tree. The selection mechanism is key to guide the exploration towards both low-energy and geometrically-diverse conformations, thus reconciling these two conflicting optimization objectives. Discretization layers over low-dimensional projections of the conformation space and energy
surface are employed to achieve these objectives. Branches in the tree are short Metropolis Monte Carlo (MMC) runs.
This project has made general contributions and advanced algorithmic research in stochastic optimization for vast, high-dimensional, and non-linear search spaces. It has proposed that discretization layers can be employed to facilitate gathering statistics over the identified conflicting optimization objectives, which can then be used to formulate probability distribution functions that guide the directions of growth for the search. The concepts put forth in this project are of special relevance to sub-areas of computer science pursuing stochastic optimization, such as evolutionary computation and
robotics-inspired motion planning. The latter has served as an inspiration for the tree-based search framework...
Please report errors in award information by writing to: awardsearch@nsf.gov.