Award Abstract # 1738979
D3SC: EAGER: Collaborative Research: A probabilistic framework for automated force field parameterization from experimental datasets

NSF Org: CHE
Division Of Chemistry
Recipient: SLOAN-KETTERING INSTITUTE FOR CANCER RESEARCH
Initial Amendment Date: June 30, 2017
Latest Amendment Date: June 30, 2017
Award Number: 1738979
Award Instrument: Standard Grant
Program Manager: Evelyn Goldfield
CHE
 Division Of Chemistry
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: August 1, 2017
End Date: July 31, 2019 (Estimated)
Total Intended Award Amount: $179,907.00
Total Awarded Amount to Date: $179,907.00
Funds Obligated to Date: FY 2017 = $179,907.00
History of Investigator:
  • John Chodera (Principal Investigator)
    john.chodera@choderalab.org
Recipient Sponsored Research Office: Sloan Kettering Institute For Cancer Research
1275 YORK AVE
NEW YORK
NY  US  10065-6007
(646)227-3273
Sponsor Congressional District: 12
Primary Place of Performance: Sloan Kettering Institute For Cancer Research
1275 York Avenue
New York
NY  US  10065-6007
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): KUKXRCZ6NZC2
Parent UEI:
NSF Program(s): Chem Thry, Mdls & Cmptnl Mthds,
CESER-Cyberinfrastructure for
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 7916, 8084, 9263
Program Element Code(s): 688100, 768400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

Michael Shirts of the University of Colorado Boulder and John Chodera of the Sloan Kettering Institute are supported by a grant from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry to develop statistical and probabilistic methods for parameterizing molecular force fields through the automated integration of information from large experimental datasets. This project is supported under the Data-Driven Discovery Science in Chemistry (D3SC) Dear Colleague Letter (DCL), and is co-funded by the Cyberinfrastructure for Emerging Science and Engineering Research (CESER) Program in the Office of Advanced Cyberinfrastructure. Force fields are classical approximations to quantum mechanical descriptions of interacting molecules. Because they are several orders of magnitude faster to use in simulations than even approximate quantum approaches, force fields are integral to computational modeling in chemistry, chemical engineering, biophysics, materials science, and soft-matter physics. New and more accurate force fields are needed in order to accelerate drug discovery, biomaterials design, and nanoscale device engineering. Currently, force fields are primarily tuned using quantum chemical calculations and small amounts of experimental data, and rely on optimization methods that often require considerable manual intervention, may not identify optimal solutions, and do not provide a way of characterizing and propagating parameter uncertainty. When experimental data is included in the tuning process, there is currently no systematic way to incorporate information on measurement error to weight the data accordingly. Professors Shirts and Chodera are developing a rigorous Bayesian probabilistic framework and statistical techniques to overcome these problems. Their approach is designed to take advantage of large, rich experimental datasets including measurement uncertainty, leverage available data more efficiently, and automate both parameter selection and the choice of functional forms in the mathematical formulation of the force field. Software from the project is being disseminated as open source Python code that can be interfaced to simulation codes such as OpenMM and GROMACS. A new Open Force Field Group, with collaborators from academia, the National Institute of Standards and Technology, and industry, will advance community-driven force field development and applications during the project and beyond.

This project is addressing the challenges of force field parameterization by applying a rigorous Bayesian inference framework to determine force fields that are maximally compatible with experimental datasets. The formalism is being applied initially to organic and aqueous liquid mixtures using the NIST ThermoML Archive, which contains a wide range of thermophysical property measurements and associated measurement errors for thousands of molecules. Specific tasks include (1) developing and evaluating an automated Bayesian force field parameterization framework that scales to large numbers of parameters and large data sets, and (2) using this approach to explore the automated selection of force field functional forms. The Bayesian probabilistic framework developed in this work promises to greatly reduce human effort, maximize force field transferability and generalizability by avoiding over-fitting, and enable the systematic extraction of available information from a given set of experimental data. The probabilistic formulation will allow force fields to be easily extended to accommodate new experimental data in a consistent manner via conditional Bayesian updates, and will provide direct routes for estimating systematic error. Initial tests of the new approach will help resolve important questions on the parameterization of molecular force fields for liquid systems, such as optimal choices of functional forms and combining rules. The same techniques can be later used to determine whether pure fluid thermodynamic properties are sufficient to parameterize fluids to reproduce mixture properties, as measured experimentally by the project, and to assess the importance of including polarization in force fields. Open source software tools will be released as easily-installed interoperable Python modules and online instructive IPython/Jupyter notebooks, and all experimental datasets and parameter sets will be freely disseminated.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fass, Josh and Sivak, David and Crooks, Gavin and Beauchamp, Kyle and Leimkuhler, Benedict and Chodera, John "Quantifying Configuration-Sampling Error in Langevin Simulations of Complex Molecular Systems" Entropy , v.20 , 2018 10.3390/e20050318 Citation Details
Mobley, David L. and Bannan, Caitlin C. and Rizzi, Andrea and Bayly, Christopher I. and Chodera, John D. and Lim, Victoria T. and Lim, Nathan M. and Beauchamp, Kyle A. and Slochower, David R. and Shirts, Michael R. and Gilson, Michael K. and Eastman, Pete "Escaping Atom Types in Force Fields Using Direct Chemical Perception" Journal of Chemical Theory and Computation , v.14 , 2018 10.1021/acs.jctc.8b00640 Citation Details
Ross, Gregory A. and Rustenburg, Ariën S. and Grinaway, Patrick B. and Fass, Josh and Chodera, John D. "Biomolecular Simulations under Realistic Macroscopic Salt Conditions" The Journal of Physical Chemistry B , v.122 , 2018 10.1021/acs.jpcb.7b11734 Citation Details
Zanette, Camila and Bannan, Caitlin C. and Bayly, Christopher I. and Fass, Josh and Gilson, Michael K. and Shirts, Michael R. and Chodera, John D. and Mobley, David L. "Toward Learned Chemical Perception of Force Field Typing Rules" Journal of Chemical Theory and Computation , v.15 , 2018 10.1021/acs.jctc.8b00821 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Molecular mechanics force fields are integral to computational modeling across a variety of disciplines, including chemistry, chemical engineering, biophysics, materials science, and soft-matter physics. In many of these fields, they are the primary tool for investigating of molecular hypotheses or for predictive chemical design.

This project aimed to solve one of the major shortcomings of how these force fields have traditionally been constructed that limited their predictive utility. Traditionally, these force fields have been constructed manually, using physical insight from humans, and fit to data using techniques that lacked statistical rigor. As a result, their ability to generalize (predict properties for new systems) or characterize their uncertainties (provide estimates of how accurate their predictions are) were highly limited.

Intellectual Merit

Traditionally, force fields have assigned parts of molecules to equivalence classes using atom types that were developed using human chemical insight. This project developed a rigorous statistical approach based on reversible-jump Monte Carlo that instead selected these atom (or more generalized) types in an automated manner using Bayesian methods that automatically penalized complexity to avoid overfitting and increase generalizability. This approach simultaneously co-optimizes the pararameters, automating the construction of accurate models in a manner that outperforms the state of the art. In addition, this Bayesian approach provides an estimate of the uncertainty associated with predictions made with this force field.

Broader Impacts

In the course of this work, we developed a number of additional products that benefitted the field of molecular simulation as a whole. 

First, all software that was produced by this project were released as free and open source software that researchers could immediately reuse in their own work.

Second, in order to meet the demands for computing high-quality condensed phase properties to fit to experiment, we developed new theory that allowed us to measure the error in a popular simulation scheme (Langevin dynamics) to automate the process of selecting an appropriate simulation strategy to meet required goals. This approach can be generally used by the molecular simulation community to produce high-accuracy estimates of physical properties of interest.

Finally, all of the algorithmic and software tools developed for this proposal have been used directly in a new open science / open source effort (the Open Force Field Initiative) to generate iteratively improved biomolecular force fields with improved accuracy, which has produced its first new small molecule force field.


Last Modified: 11/15/2019
Modified by: John D Chodera

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page