
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 30, 2017 |
Latest Amendment Date: | August 27, 2021 |
Award Number: | 1740990 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2017 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $497,056.00 |
Total Awarded Amount to Date: | $547,056.00 |
Funds Obligated to Date: |
FY 2021 = $50,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
575 LEXINGTON AVE FL 9 NEW YORK NY US 10022-6145 (646)962-8290 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1300 York Avenue New York NY US 10065-4896 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software & Hardware Foundation, Big Data Science &Engineering |
Primary Program Source: |
01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Molecular dynamics simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have dramatically higher performance than do current systems, generating more data that needs to be analyzed (i.e., in terms of number and length of molecular dynamics trajectories). The coordination of data generation and analysis cannot rely on manual, centralized approaches as it does now. This interdisciplinary project integrates research from various areas across programs such as computer science, structural molecular biosciences, and high performance computing to transform the centralized nature of the molecular dynamics analysis into a distributed approach that is predominantly performed in situ. Specifically, this effort combines machine learning and data analytics approaches, workflow management methods, and high performance computing techniques to analyze molecular dynamics data as it is generated, save to disk only what is really needed for future analysis, and annotate molecular dynamics trajectories to drive the next steps in increasingly complex simulations' workflows.
The investigators tackle the data challenge of data analysis of molecular dynamics simulations on the next-generation supercomputers by (1) creating new in situ methods to trace molecular events such as conformational changes, phase transitions, or binding events in molecular dynamics simulations at runtime by locally reducing knowledge on high-dimensional molecular organization into a set of relevant structural molecular properties; (2) designing new data representations and extend unsupervised machine learning techniques to accurately and efficiently build an explicit global organization of structural and temporal molecular properties; (3) integrating simulation and analytics into complex workflows for runtime detection of changes in structural and temporal molecular properties; and (4) developing new curriculum material, online courses, and online training material targeting data analytics. The project's harnessed knowledge of molecular structures' transformations at runtime can be used to steer simulations to more promising areas of the simulation space, identify the data that should be written to congested parallel file systems, and index generated data for retrieval and post-simulation analysis. Supported by this knowledge, molecular dynamics workflows such as replica exchange simulations, Markov state models, and the string method with swarms of trajectories can be executed ?from the outside? (i.e., without reengineering the molecular dynamics code).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Molecular Dynamics (MD) simulations are widely recognized in various fields, such as chemistry, materials science, molecular biology, and drug design. The scope and duration of MD simulations have consistently expanded, making them the most common simulations on petascale computers. For instance, a six-month study in 2022 of the National Science Foundation's computing resources indicated that biomolecular codes, mainly MD codes, accounted for 25.7% of their usage. The shift from petascale to exascale computing has brought unparalleled computational power to MD simulations, allowing the new high-performance computing systems to perform more extensive and longer simulations. This increased capability results in larger datasets from MD simulations, necessitating analysis that keeps pace with the simulations.
This project transforms MD analysis from a centralized to a distributed in situ approach, accommodating a wide array of MD codes and enabling real-time adjustments of MD workflows. Unlike conventional centralized data analytics, which save all trajectory data for post-simulation analysis, this project implements advanced collective variable calculation and annotates MD outputs to guide subsequent stages in complex MD workflows.
The project delivers an in situ data analytics method compatible with popular MD codes without requiring recompilation or script redesign. It captures output in memory in real-time, enhancing adaptive sampling. This allows for exploring conformational spaces in simple peptides and complex systems like ribosomes. The solution assesses the efficiency of ensemble trajectories and in situ methods on supercomputers. Annotation-based early termination enables scientists to cover more conformational space with fewer MD steps than traditional methods without such termination or steering.
Last Modified: 01/29/2024
Modified by: Harel Weinstein
Please report errors in award information by writing to: awardsearch@nsf.gov.