
NSF Org: |
CHE Division Of Chemistry |
Recipient: |
|
Initial Amendment Date: | June 30, 2017 |
Latest Amendment Date: | June 30, 2017 |
Award Number: | 1738979 |
Award Instrument: | Standard Grant |
Program Manager: |
Evelyn Goldfield
CHE Division Of Chemistry MPS Directorate for Mathematical and Physical Sciences |
Start Date: | August 1, 2017 |
End Date: | July 31, 2019 (Estimated) |
Total Intended Award Amount: | $179,907.00 |
Total Awarded Amount to Date: | $179,907.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1275 YORK AVE NEW YORK NY US 10065-6007 (646)227-3273 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1275 York Avenue New York NY US 10065-6007 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Chem Thry, Mdls & Cmptnl Mthds, CESER-Cyberinfrastructure for |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
Michael Shirts of the University of Colorado Boulder and John Chodera of the Sloan Kettering Institute are supported by a grant from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry to develop statistical and probabilistic methods for parameterizing molecular force fields through the automated integration of information from large experimental datasets. This project is supported under the Data-Driven Discovery Science in Chemistry (D3SC) Dear Colleague Letter (DCL), and is co-funded by the Cyberinfrastructure for Emerging Science and Engineering Research (CESER) Program in the Office of Advanced Cyberinfrastructure. Force fields are classical approximations to quantum mechanical descriptions of interacting molecules. Because they are several orders of magnitude faster to use in simulations than even approximate quantum approaches, force fields are integral to computational modeling in chemistry, chemical engineering, biophysics, materials science, and soft-matter physics. New and more accurate force fields are needed in order to accelerate drug discovery, biomaterials design, and nanoscale device engineering. Currently, force fields are primarily tuned using quantum chemical calculations and small amounts of experimental data, and rely on optimization methods that often require considerable manual intervention, may not identify optimal solutions, and do not provide a way of characterizing and propagating parameter uncertainty. When experimental data is included in the tuning process, there is currently no systematic way to incorporate information on measurement error to weight the data accordingly. Professors Shirts and Chodera are developing a rigorous Bayesian probabilistic framework and statistical techniques to overcome these problems. Their approach is designed to take advantage of large, rich experimental datasets including measurement uncertainty, leverage available data more efficiently, and automate both parameter selection and the choice of functional forms in the mathematical formulation of the force field. Software from the project is being disseminated as open source Python code that can be interfaced to simulation codes such as OpenMM and GROMACS. A new Open Force Field Group, with collaborators from academia, the National Institute of Standards and Technology, and industry, will advance community-driven force field development and applications during the project and beyond.
This project is addressing the challenges of force field parameterization by applying a rigorous Bayesian inference framework to determine force fields that are maximally compatible with experimental datasets. The formalism is being applied initially to organic and aqueous liquid mixtures using the NIST ThermoML Archive, which contains a wide range of thermophysical property measurements and associated measurement errors for thousands of molecules. Specific tasks include (1) developing and evaluating an automated Bayesian force field parameterization framework that scales to large numbers of parameters and large data sets, and (2) using this approach to explore the automated selection of force field functional forms. The Bayesian probabilistic framework developed in this work promises to greatly reduce human effort, maximize force field transferability and generalizability by avoiding over-fitting, and enable the systematic extraction of available information from a given set of experimental data. The probabilistic formulation will allow force fields to be easily extended to accommodate new experimental data in a consistent manner via conditional Bayesian updates, and will provide direct routes for estimating systematic error. Initial tests of the new approach will help resolve important questions on the parameterization of molecular force fields for liquid systems, such as optimal choices of functional forms and combining rules. The same techniques can be later used to determine whether pure fluid thermodynamic properties are sufficient to parameterize fluids to reproduce mixture properties, as measured experimentally by the project, and to assess the importance of including polarization in force fields. Open source software tools will be released as easily-installed interoperable Python modules and online instructive IPython/Jupyter notebooks, and all experimental datasets and parameter sets will be freely disseminated.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Molecular mechanics force fields are integral to computational modeling across a variety of disciplines, including chemistry, chemical engineering, biophysics, materials science, and soft-matter physics. In many of these fields, they are the primary tool for investigating of molecular hypotheses or for predictive chemical design.
This project aimed to solve one of the major shortcomings of how these force fields have traditionally been constructed that limited their predictive utility. Traditionally, these force fields have been constructed manually, using physical insight from humans, and fit to data using techniques that lacked statistical rigor. As a result, their ability to generalize (predict properties for new systems) or characterize their uncertainties (provide estimates of how accurate their predictions are) were highly limited.
Intellectual Merit
Traditionally, force fields have assigned parts of molecules to equivalence classes using atom types that were developed using human chemical insight. This project developed a rigorous statistical approach based on reversible-jump Monte Carlo that instead selected these atom (or more generalized) types in an automated manner using Bayesian methods that automatically penalized complexity to avoid overfitting and increase generalizability. This approach simultaneously co-optimizes the pararameters, automating the construction of accurate models in a manner that outperforms the state of the art. In addition, this Bayesian approach provides an estimate of the uncertainty associated with predictions made with this force field.
Broader Impacts
In the course of this work, we developed a number of additional products that benefitted the field of molecular simulation as a whole.
First, all software that was produced by this project were released as free and open source software that researchers could immediately reuse in their own work.
Second, in order to meet the demands for computing high-quality condensed phase properties to fit to experiment, we developed new theory that allowed us to measure the error in a popular simulation scheme (Langevin dynamics) to automate the process of selecting an appropriate simulation strategy to meet required goals. This approach can be generally used by the molecular simulation community to produce high-accuracy estimates of physical properties of interest.
Finally, all of the algorithmic and software tools developed for this proposal have been used directly in a new open science / open source effort (the Open Force Field Initiative) to generate iteratively improved biomolecular force fields with improved accuracy, which has produced its first new small molecule force field.
Last Modified: 11/15/2019
Modified by: John D Chodera
Please report errors in award information by writing to: awardsearch@nsf.gov.