Skip to feedback

Award Abstract # 1761990
Spokes: MEDIUM: NORTHEAST: Collaborative: Advancing a Data-Driven Discovery and Rational Design Paradigm in Chemistry

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: THE RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK
Initial Amendment Date: August 23, 2018
Latest Amendment Date: August 23, 2018
Award Number: 1761990
Award Instrument: Standard Grant
Program Manager: Lin He
lhe@nsf.gov
 (703)292-4956
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $700,000.00
Total Awarded Amount to Date: $700,000.00
Funds Obligated to Date: FY 2018 = $700,000.00
History of Investigator:
  • Johannes Hachmann (Principal Investigator)
    hachmann@buffalo.edu
  • Alan Aspuru-Guzik (Co-Principal Investigator)
  • Geoffrey Hutchison (Co-Principal Investigator)
  • Marcus Hanwell (Co-Principal Investigator)
Recipient Sponsored Research Office: SUNY at Buffalo
520 LEE ENTRANCE STE 211
AMHERST
NY  US  14228-2577
(716)645-2634
Sponsor Congressional District: 26
Primary Place of Performance: SUNY at Buffalo
612 Furnas Hall
Buffalo
NY  US  14260-4200
Primary Place of Performance
Congressional District:
26
Unique Entity Identifier (UEI): LMCJKRFW5R81
Parent UEI: GMZUKXFDJMA9
NSF Program(s): BD Spokes -Big Data Regional I,
OFFICE OF MULTIDISCIPLINARY AC,
DMR SHORT TERM SUPPORT,
PROJECTS
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 028Z, 054Z, 062Z, 7433, 8083
Program Element Code(s): 024Y00, 125300, 171200, 197800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

With support from the Division of Chemistry (CHE) and the Division of Materials Research (DMR) in the Directorate for Mathematical and Physical Sciences (MPS) and the Directorate for Computer & Information Science & Engineering (CISE), this project aims to advance the use of modern data science in chemistry. In particular, the project will advance the field of data-driven chemical research by promoting the use of machine learning and other data mining techniques in the molecular sciences and by fostering and coalescing a community of stakeholders. The work of the project and the community it represents aims to transform chemistry's ability to tackle challenging discovery and design problems. This approach can dramatically accelerate and streamline the process that leads to chemical innovation -- an important factor in economic and technological advancement -- and thus result in an improved return on public and private investments. The project also addresses corresponding questions of training and workforce development needed in chemistry, thus insuring the US's international competitiveness.

The mission of this project is to assert the role of big data research in the chemical domain, i.e., to promote, enable, and advance the ideas of data-driven discovery and rational design. The project aims to create a community-driven roadmap as well as facilitate concrete solutions that are beyond the scope of the disjointed efforts of its individual stakeholders. The Big Data Hubs and Spokes ecosystem is the ideal framework to realizing this vision and accelerating progress in this high-priority area of research. The effort at hand sets out to implement some of the key findings of the recent NSF Division of Chemistry workshop on Framing the Role of Big Data and Modern Data Science in Chemistry. The four signature initiatives of this Spoke project include (i) the planning, coordination, integration, and consolidation of community-developed software tools for big data research in chemistry as well as the formulation of guidelines, best practices, and standards; (ii) the organization of workshops for community building, to connect solution seekers with solution providers, and to address questions ranging from strategic to technical; and (iii) the creation and dissemination of community-developed teaching materials as well as the formulation of course, program, and curricular recommendations for education and workforce development that reflect the changing, data-centric approach in chemical research; (iv) providing access to a shared hardware infrastructure for community data sets, on-site data mining capacity, and the exploration of domain specific method and hardware issues.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ferguson, Andrew L. and Hachmann, Johannes and Miller, Thomas F. and Pfaendtner, Jim "The Journal of Physical Chemistry A / B / C Virtual Special Issue on Machine Learning in Physical Chemistry" The Journal of Physical Chemistry A , v.124 , 2020 https://doi.org/10.1021/acs.jpca.0c09205 Citation Details
Haghighatlari, Mojtaba and Hachmann, Johannes "Advances of machine learning in molecular modeling and simulation" Current Opinion in Chemical Engineering , v.23 , 2019 10.1016/j.coche.2019.02.009 Citation Details
Vishwakarma, Gaurav and Sonpal, Aditya and Hachmann, Johannes "Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and Best Practices for Machine Learning in Chemistry" Trends in Chemistry , v.3 , 2021 https://doi.org/10.1016/j.trechm.2020.12.004 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The primary goal of this project was to help establish the use of modern data science as a new and powerful tool in chemistry and to advance the field of data-driven chemical research (i.e., by adapting artificial intelligence, machine learning, informatics, and data mining techniques for the molecular sciences). The project set out to achieve this goal by (i) community building (i.e., fostering and coalescing a community of stakeholders in data-driven research), (ii) resource pooling, and (iii) resource development. For this purpose, we set up a communication platform for community stakeholders, created resource compilations on (a) software and tools, (b) courses and course materials, and (c) workshop topics of interest, and disseminated these to the stakeholders. The stakeholders included some of the leaders and pioneers of the field, but also featured strategic partners, e.g., the Canadian Accelerator Consortium. The project resulted in advancements of community tools such as the Avogadro and ChemBDDB codes. We also conducted exploratory work on the technical aspects of local data hosting and sharing in the spirit of Science Gateways, however, we concluded that this was not a viable proposition given the scope of this project. The project team published topical reviews and best practice guides and also organized topical journal issues. The project further resulted in events such as the "Roundtable Discussion: Current State and Future of Data Science in Chemical Engineering" at the 2022 AIChE National Meeting in Phoenix.


Last Modified: 04/05/2024
Modified by: Johannes Hachmann

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page