
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 30, 2017 |
Latest Amendment Date: | August 26, 2021 |
Award Number: | 1741040 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2017 |
End Date: | September 30, 2022 (Estimated) |
Total Intended Award Amount: | $516,000.00 |
Total Awarded Amount to Date: | $616,000.00 |
Funds Obligated to Date: |
FY 2021 = $100,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4676 Admiralty Way, Suite 1001 Marina del Rey CA US 90292-6611 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software & Hardware Foundation, Big Data Science &Engineering |
Primary Program Source: |
01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Molecular dynamics simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have dramatically higher performance than do current systems, generating more data that needs to be analyzed (i.e., in terms of number and length of molecular dynamics trajectories). The coordination of data generation and analysis cannot rely on manual, centralized approaches as it does now. This interdisciplinary project integrates research from various areas across programs such as computer science, structural molecular biosciences, and high performance computing to transform the centralized nature of the molecular dynamics analysis into a distributed approach that is predominantly performed in situ. Specifically, this effort combines machine learning and data analytics approaches, workflow management methods, and high performance computing techniques to analyze molecular dynamics data as it is generated, save to disk only what is really needed for future analysis, and annotate molecular dynamics trajectories to drive the next steps in increasingly complex simulations' workflows.
The investigators tackle the data challenge of data analysis of molecular dynamics simulations on the next-generation supercomputers by (1) creating new in situ methods to trace molecular events such as conformational changes, phase transitions, or binding events in molecular dynamics simulations at runtime by locally reducing knowledge on high-dimensional molecular organization into a set of relevant structural molecular properties; (2) designing new data representations and extend unsupervised machine learning techniques to accurately and efficiently build an explicit global organization of structural and temporal molecular properties; (3) integrating simulation and analytics into complex workflows for runtime detection of changes in structural and temporal molecular properties; and (4) developing new curriculum material, online courses, and online training material targeting data analytics. The project's harnessed knowledge of molecular structures' transformations at runtime can be used to steer simulations to more promising areas of the simulation space, identify the data that should be written to congested parallel file systems, and index generated data for retrieval and post-simulation analysis. Supported by this knowledge, molecular dynamics workflows such as replica exchange simulations, Markov state models, and the string method with swarms of trajectories can be executed ?from the outside? (i.e., without reengineering the molecular dynamics code).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Molecular Dynamics (MD) simulations are widely recognized in chemistry, material sciences, molecular biology, and drug design. The system sizes and time scales accessible to MD simulations have been steadily increasing}. Today MD simulations are the most common simulations running on petascale machines. For example, a survey of usage of NSF computing resources over six months of 2022 shows how biomolecular codes, predominantly MD codes, use 25.7% of these resources. The transition from petascale to exascale computing brings unprecedented computing capability to MD simulations. The new generation of high-performance computing (HPC) systems have more significant computing power. This increased computing capability directly translates into the ability to execute many more and more extended simulations. For MD simulations, this, in turn, translates to more data that needs to be analyzed. The analysis must co-occur to keep up with the simulations' pace.
This project transformed the centralized nature of the MD analysis into a distributed approach that is performed in situ and supports a broad range of MD codes. It can enable on-the-fly tuning of MD workflows. Contrary to traditional MD data analytics that uses centralized data analysis (i.e., first generates and saves all the trajectory data to storage and then relies on the post-simulation analysis), the project calculated advanced collective variables to analyze data as they are generated and annotates MD outputs to steer the next steps in increasingly complex MD workflows.
The project designed an in situ data analytics approach for the most commonly used MD codes. The targeted workflows did not require the recompilation of any single MD code nor the redesign of any MD script. Instead, the new solutions captured outputs in memory at runtime as they were generated. The project demonstrated these new capabilities in the context of enhanced adaptive sampling. It enabled exploring the conformational space of simple peptides and complex molecular systems such as ribosomes. The proposed solution modeled the execution of an ensemble of trajectories starting from random unfolded states and analyzed the overall throughput obtained using in situ methods and the MD framework on supercomputers. Using annotation-based early termination, scientists can now obtain more extensive coverage of the studied reference conformational space with fewer MD steps otherwise used for a traditional execution of the MD simulation (i.e., without any early termination or steering).
Last Modified: 01/30/2023
Modified by: Ewa Deelman
Please report errors in award information by writing to: awardsearch@nsf.gov.