Skip to feedback

Award Abstract # 1017231
III: Small: Collaborative Research: Analysis of Multi-Dimensional Protein Design Spaces with Pareto Optimization of Experimental Designs

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF DARTMOUTH COLLEGE
Initial Amendment Date: September 16, 2010
Latest Amendment Date: September 16, 2010
Award Number: 1017231
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 15, 2010
End Date: August 31, 2014 (Estimated)
Total Intended Award Amount: $331,802.00
Total Awarded Amount to Date: $331,802.00
Funds Obligated to Date: FY 2010 = $331,802.00
History of Investigator:
  • Christopher Bailey-Kellogg (Principal Investigator)
    cbk@cs.dartmouth.edu
Recipient Sponsored Research Office: Dartmouth College
7 LEBANON ST
HANOVER
NH  US  03755-2170
(603)646-3007
Sponsor Congressional District: 02
Primary Place of Performance: Dartmouth College
7 LEBANON ST
HANOVER
NH  US  03755-2170
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): EB8ASJBCFER9
Parent UEI: T4MWFG59C6R3
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 9150
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In developing variants of natural proteins with improved properties and activities, protein engineers are confronted with large, complex design spaces. The degrees of freedom for producing variants mirror nature but can be specifically targeted experimentally, choosing parent proteins, replacements for some amino acids (site-directed mutation), and locations for crossing over between parents (site-directed recombination). A set of choices, constituting a design, can be evaluated by multiple disparate criteria, including consistency with evolutionary information, energetic favorability with respect to a three-dimensional structure, and incorporation of specific characteristics distinguishing functional subclasses. Unfortunately, the different evaluation metrics may be complementary or even contradictory, and the prior information on which they are based is incomplete, so that the metrics are only more or less accurate in predicting the real-life quality of the designs.

The overall goal of this project is to develop efficient methods to characterize complex protein design spaces and optimize high-quality designs for experimental evaluation. A combinatorial protein engineering approach will be pursued, experimentally constructing a library of related variants and assaying them for properties of interest. Potential scores will evaluate a possible library (without explicitly enumerating its members) with respect to prior information from sequence, structure, and functional subclass. To account for disparate evaluation metrics, design algorithms will focus on the
identification of Pareto optimal designs, those for which no other design is as good or better with respect to all desired criteria. To account for incomplete prior information, design algorithms will trade off between exploitation of the prior information and broader exploration of the design space, seeking to identify a diverse set of designs, each with a diverse set of variants. Markov Chain Monte Carlo sampling algorithms will characterize the overall design space by generating choices for the degrees of freedom and evaluating the designs with the potential scores, using the scores and diversity metrics to appropriately explore the space. Exact algorithms will more precisely focus on regions of interest, dividing and conquering the design space and employing combinatorial optimization algorithms to identify Pareto optimal designs.

The design space approach provides a powerful new mechanism to address protein engineering applications, enabling the engineer to explicitly evaluate and optimize for trade-offs among important criteria and considerations. Interactive tools will help engineers navigate through the regions of interest, visualize designs and perform "what-if" analyses, and compare and contrast Pareto optimal designs. A design space repository will enable sharing of analyses and underlying data. The tools and repository will support protein engineering for a range of activities in the national interest, including biosensors, production of novel biological therapeutics and novel enzymes for green chemical synthesis, energy extraction, and bioremediation. As part of the project, the mechanism will be put to use in the engineering of soluble and robust cytochrome P450s that employ the inexpensive and non-toxic hydrogen peroxide to hydroxylate steroids and multi-ring compounds that mimic estrogenic (feminizing) steroids in the environment without the need for living cells or protein cofactors. Such enzymes would be valuable as tools for chemical synthesis, waste treatment, and bioremediation.

This project provides an ideal venue to impart cross-disciplinary training to students by illustrating how computational techniques can be fruitfully integrated with experimentation in answering important biological questions. Aspects of the project will be used in both undergraduate and graduate courses, from an introductory biology course to an advanced bioinformatics course. The project itself will provide the opportunity for inter-disciplinary research training for graduates and undergraduates, including those from underrepresented groups.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A.S. Parker, Y. Choi, K.E. Griswold, and C. Bailey-Kellogg "Structure-guided deimmunization of therapeutic proteins" J Comput Biol , v.20 , 2013 , p.152
D.C. Osipovitch, A.S. Parker, C.D. Makokha, J. Desrosiers, W.C. Kett, L. Moise, C. Bailey-Kellogg, and K.E. Griswold "Design and analysis of immune" Protein engineering design & selection , v.25 , 2012 , p.613
L. He, A.M. Friedman, and C. Bailey-Kellogg "Algorithms for optimizing cross-overs in DNA shuffling" BMC Bioinformatics , 2012
L. He, A.M. Friedman, and C. Bailey-Kellogg "Algorithms for optimizing cross-overs in DNA shuffling" Proc. ACM BCB , 2011
L. He, A.M. Friedman, and C. Bailey-Kellogg "Pareto optimal protein engineering" Proteins , 2012
L. He, A.S. De Groot, A.H Gutierrez, W.D. Martin, L. Moise, and C. Bailey-Kellogg "Integrated assessment of predicted MHC binding and cross-conservation with self reveals patterns of viral camouflage" BMC Bioinformatics , v.15 , 2014 , p.S1
Osipovitch DC, Parker AS, Makokha CD, Desrosiers J, Kett WC, Moise L, Bailey-Kellogg C, Griswold KE "Design and analysis of immune-evading enzymes for ADEPT therapy" Protein Eng Des Sel , 2012

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The focus of this project is to develop and apply new methods for uncovering good designs in complex design spaces characterized by high degrees of freedom and multiple quality measures. The motivating context is protein design, where the degrees of freedom are choices of site-directed mutations and site-directed recombination breakpoints, and the quality measures are derived from analysis of protein sequence, structure, and function.  Recognizing that different quality measures may be complementary or even competing, the goal adopted here is to identify the Pareto optimal designs -- those for which no other design is as good or better with respect to all criteria. 


The project yielded a portfolio of powerful new computational methods for efficiently assessing diverse criteria and optimizing them in different design settings. It also demonstrated the utility of these methods in a wide range of protein design applications, impacting a broad set of life science researchers and engineers. Two general algorithmic approaches were developed to uncover all and only the Pareto optimal designs, without explicitly considering the massive number of designs that they dominate. These algorithms were designed and instantiated for a variety of protein design contexts including optimizing sets of enzymes for a combination of improved activity and overall diversity, optimizing interacting proteins so as to balance strength and specificity of interaction, and optimizing therapeutic proteins in order to mitigate undesired immune recognition while maintaining stability and function. In addition to optimizing individual proteins, the methods were extended to optimize entire combinatorial libraries of proteins and peptides, mixing-and-matching mutations or gene fragments to yield sets of variants that overall are enriched in desired combinations of properties of interest. 

 

 

 


Last Modified: 11/26/2014
Modified by: Christopher J Bailey-Kellogg

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page