Award Abstract # 1016995
AF: Small: A Unified Computational Framework to Enhance the Ab-Initio Sampling of Native-Like Protein Conformations

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: GEORGE MASON UNIVERSITY
Initial Amendment Date: August 25, 2010
Latest Amendment Date: August 25, 2010
Award Number: 1016995
Award Instrument: Standard Grant
Program Manager: Mitra Basu
mbasu@nsf.gov
 (703)292-8649
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2010
End Date: August 31, 2014 (Estimated)
Total Intended Award Amount: $449,998.00
Total Awarded Amount to Date: $449,998.00
Funds Obligated to Date: FY 2010 = $449,998.00
History of Investigator:
  • Amarda Shehu (Principal Investigator)
    ashehu@gmu.edu
Recipient Sponsored Research Office: George Mason University
4400 UNIVERSITY DR
FAIRFAX
VA  US  22030-4422
(703)993-2295
Sponsor Congressional District: 11
Primary Place of Performance: George Mason University
4400 UNIVERSITY DR
FAIRFAX
VA  US  22030-4422
Primary Place of Performance
Congressional District:
11
Unique Entity Identifier (UEI): EADLFP7Z72E5
Parent UEI: H4NRWLFCDF43
NSF Program(s): Computational Biology
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9218, HPCC
Program Element Code(s): 793100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The research involves the design and analysis of a framework to compute the spatial arrangements, also known as conformations, in which a protein chain of amino acids is biologically-active (in its native state). This is an important goal towards understanding protein function. While proteins are central to many biochemical processes, little is known about millions of protein sequences obtained from organismal genomes.

Intellectual Merit: The intellectual merit of this work lies in the development of a novel computational framework that combines probabilistic exploration with the theory of statistical mechanics to efficiently enhance the sampling of the conformational space near the native state. Low-dimensional projections guide the exploration towards low-energy and geometrically-diverse conformations. Additional intellectual merit lies in the incorporation of knowledge and observations emerging from biophysical theory and experiment, such as the use of coarse graining, relation between energy barrier height and temperature, and hierarchical organization of tertiary structure. Algorithmic components of the framework will be systematically evaluated for efficiency, accuracy, and how they enhance the sampling of the conformational space near the native state.

Broader Impact: The broader impact of this research will be the creation of a filter that efficiently computes diverse coarse-grained conformations relevant for the protein native state that can then be further refined through detailed biophysical studies. The work lies at the interface between computer science and protein biophysics and can benefit both communities. On the computational side, the work will lead to new algorithms on modeling articulated chains characterized by continuous high-dimensional search spaces and complex energy surfaces. On the biophysical side, the framework will elucidate which aspects of our understanding of proteins allow efficient and accurate modeling. The work will impact both undergraduate and graduate students. New courses are proposed by the investigator as part of efforts to introduce computational biology in the computer science curriculum at George Mason University. The work will be employed as a pedagogic device in courses and educational outreach venues to spawn and maintain interest in computer science, with a particular focus on women and minorities.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Brian Olson and Amarda Shehu "Populating Local Minima in the Protein Conformational Space" IEEE Intl Conf on Bioinf and Biomed (IEEE BIBM) , 2011 , p.114
Brian Olson and Amarda Shehu "Rapid Sampling of Local Minima in Protein Energy Surface and Effective Reduction through a Multi-objective Filter" Proteome Science , v.11 , 2013 , p.1-10
Brian Olson, Irina Hashmi, Kevin Molloy, and Amarda Shehu "Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules" Advances in Artificial Intelligence J , v.2012 , 2012 , p.674832 http://dx.doi.org/10.1155/2012/674832
Brian Olson, Kevin Molloy, and Amarda Shehu "Enhancing Sampling of the Conformational Space Near the Protein Native State" Intl. Conference on Bio-inspired Models of Network, Information, and Computing Systems (BIONETICS) , v.0087 , 2010
Brian Olson, Kevin Molloy, S.-Farid Hendi, and Amarda Shehu "Guiding Search in the Protein Conformational Space with Structural Profiles" J Bioinf and Comp Biol 2012 , v.10 , 2012 , p.1242005
Brian Olson, S. Farid Hendi, and Amarda Shehu "Protein Conformational Search with Geometric Projections" Comp Struct Biol Workshop (CSBW), IEEE BIBM , 2011 , p.366
Kevin Molloy and Amarda Shehu "Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method" BMC Structural Biology J , v.13 , 2013 , p.1-15
Kevin Molloy, M. Jennifer Van, Daniel Barbara, and Amarda Shehu. "Exploring Representations of Protein Structure for Automated Remote Homology Detection and Mapping of Protein Structure Space." BMC Bioinformatics , v.15 , 2014 , p.S4 10.1186/1471-2105-15-S8-S4
Kevin Molloy, Saleh Salehu, and Amarda Shehu. "Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction." IEEE/ACM Trans Comp Biol and Bioinf , v.9 , 2013 , p.1-14 http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.29
Olson Brian; Molloy Kevin; Shehu Amarda "In Search of the Protein Native State with a Probabilistic Sampling Approach" J of Bioinf and Comp Biol , v.9 , 2011 , p.383
Sameh Saleh, Brian Olson, and Amarda Shehu "A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction" BMC Structural Biology J , v.13 , 2013 , p.1-15
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Protein molecules are central to our biology. They achieve their biological function by adopting specific three-dimensional shapes or structures with which they are then able to interact with other molecules in the cell. A central goal of genomic sequencing efforts has been to decode sequences of proteins operating in a species' cells. Unfortunately, the ability to understand biological functions of novel proteins is tightly related to our ability to resolve the three-dimensional, biologically-active structures adopted by the decoded sequences of these proteins. Therefore, extracting structural information about a protein from its amino-acid sequence is a central problem in molecular biology. The problem is known as de novo protein structure prediction, with de novo indicating the fact that the process has to be performed from first principles in the absence of other structures of proteins of similar sequence to the target sequence. Computational approaches offer to complement wet-laboratory efforts in this endeavor, promising both speed and accuracy.

This project has advanced algorithmic research to address this problem. Its main contribution has been on designing more powerful algorithms capable of efficiently navigating the vast space of alternative spatial arrangements of a protein's chain of amino acids in search of those arrangements representing the biologically-active structure. The space of such arrangements, also known as conformations, is vast and high-dimensional, as a protein chain is highly deformable and composed of hundreds of amino acids or more. The algorithmic framework put forth by this project essentially conducts a guided navigation of the conformational space, exploiting knowledge that conformations adopted to perform a biological function position atoms so as to maximize favorable interactions and minimize unfavorable ones. Summing these interactions in an energy function allows associating an energy score with a conformation. Thus, de novo structure prediction can be viewed as an optimization problem. Given that the space of conformations is vast and high-dimensional, and the underlying energy surface associated with this space is rich in local minima, baseline optimization algorithms typically underperform. Most suffer from premature convergence to local minima which do not represent the biologically-active structure.

The algorithmic framework that has resulted from this project is able to avoid premature convergence. The framework navigates the conformational space by growing a tree in it. Vertices of the tree are conformations, and the tree grows in iterations. At every iteration, a decision is made on which vertex to expand so as to grow an additional branch terminating with a new vertex added to the tree. The selection mechanism is key to guide the exploration towards both low-energy and geometrically-diverse conformations, thus reconciling these two conflicting optimization objectives. Discretization layers over low-dimensional projections of the conformation space and energy
surface are employed to achieve these objectives. Branches in the tree are short Metropolis Monte Carlo (MMC) runs.

This project has made general contributions and advanced algorithmic research in stochastic optimization for vast, high-dimensional, and non-linear search spaces. It has proposed that discretization layers can be employed to facilitate gathering statistics over the identified conflicting optimization objectives, which can then be used to formulate probability distribution functions that guide the directions of growth for the search. The concepts put forth in this project are of special relevance to sub-areas of computer science pursuing stochastic optimization, such as evolutionary computation and
robotics-inspired motion planning. The latter has served as an inspiration for the tree-based search framework...

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page