Award Abstract # 1751161
CAREER: Building an Advanced Cyberinfrastructure for the Data-Driven Design of Chemical Systems and the Exploration of Chemical Space

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: THE RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK
Initial Amendment Date: February 8, 2018
Latest Amendment Date: February 8, 2018
Award Number: 1751161
Award Instrument: Standard Grant
Program Manager: Juan Li
jjli@nsf.gov
 (703)292-2625
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: March 1, 2018
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $561,685.00
Total Awarded Amount to Date: $561,685.00
Funds Obligated to Date: FY 2018 = $561,685.00
History of Investigator:
  • Johannes Hachmann (Principal Investigator)
    hachmann@buffalo.edu
Recipient Sponsored Research Office: SUNY at Buffalo
520 LEE ENTRANCE STE 211
AMHERST
NY  US  14228-2577
(716)645-2634
Sponsor Congressional District: 26
Primary Place of Performance: SUNY at Buffalo
612 Furnas Hall
Buffalo
NY  US  14260-4200
Primary Place of Performance
Congressional District:
26
Unique Entity Identifier (UEI): LMCJKRFW5R81
Parent UEI: GMZUKXFDJMA9
NSF Program(s): CAREER: FACULTY EARLY CAR DEV,
Chem Thry, Mdls & Cmptnl Mthds
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 062Z, 1045, 8084, 9263
Program Element Code(s): 104500, 688100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Innovation in chemistry and materials is a key driver of economic development, prosperity, and a rising standard of living. It also offers solutions to pressing problems on energy, environmental sustainability, and resources that shape our society. This research program is designed to boost the chemistry community's capacity to address these challenges by transforming the process that creates underlying innovation. The research promotes a shift away from trial-and-error searches and towards rational design. These combine traditional chemical research with modern data science by introducing tools such as machine learning into the chemical context. This project enables and advances this emerging field by building a cyberinfrastructure that makes data-driven research a viable and widely accessible proposition for the chemistry community, and thereby an integral part of the chemical enterprise. Tools and methods developed in this research provide the means for the large-scale exploration of chemical space and for a better understanding of the hidden mechanisms that determine the behavior of complex chemical systems. These insights can potentially accelerate, streamline, and ultimately transform the chemical development process. The project also tackles the concomitant need to adapt education to this new research landscape in order to adequately equip the next generation of scientists and engineers, to build a competent and skilled workforce for the cutting-edge R&D of the future, and to ensure the competitiveness of US students in the international job market. By promoting minority participation in this promising field, it contributes to a sustained push towards equal opportunity in our society. This project thus promotes the progress of science and advances prosperity and welfare as stated by NSF's mission. 

While there is growing agreement on the value of data-driven discovery and rational design, this approach is still far from being a mainstay of everyday research in the chemistry community. This work addresses three key obstacles: (i) data-driven research is beyond the scope and reach of most chemists due to a lack of available and accessible tools, (ii) many fundamental and practical questions on how to make data science work for chemical research remain unresolved, and (iii) data science is not part of the formal training of chemists, and much of the community thus lacks the necessary experience and expertise to utilize it. This research centers around the creation of an open, general-purpose software ecosystem that fuses in silico modeling, virtual high-throughput screening, and big data analytics (i.e., the use of machine learning, informatics, and database technology for the validation, mining, and modeling of resulting data sets) into an integrated research infrastructure. A key consideration is to make this ecosystem as comprehensive, robust, and user-friendly as possible, so that it can readily be employed by interested researchers without the need for extensive expert knowledge. It also serves as a development platform and testbed for innovation in the underlying methods, algorithms, and protocols, i.e., it allows the community to systematically and efficiently evaluate the utility and performance of different techniques, including new ones that are being introduced as part of this project. A meta machine learning approach is being developed to establish guidelines and best practices that provide added value to the cyberinfrastructure. The work is driven by concrete molecular design problems, which serve to demonstrate the efficacy of the overall approach. The educational challenges that arise from the qualitative novelty of data-driven research and its inherent interdisciplinarity are addressesed by leveraging a new graduate program in Computational and Data-Enabled Science and Engineering for cross-cutting course and curricular developments, the creation of interactive teaching materials, and a skill-building hackathon initiative. This award is jointly made with the Division of Chemistry's, Chemical Theory, Models and Computational Methods Program.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Afzal, Mohammad Atif and Hachmann, Johannes "Benchmarking DFT approaches for the calculation of polarizability inputs for refractive index predictions in organic polymers" Physical Chemistry Chemical Physics , v.21 , 2019 10.1039/C8CP05492D Citation Details
Afzal, Mohammad Atif and Hachmann, Johannes "High-Throughput Computational Studies in Catalysis and Materials Research, and Their Impact on Rational Design" Handbook on Big Data and Machine Learning in the Physical Sciences, Vol 1: Big Data Methods in Experimental Materials Discovery , v.1 , 2020 https://doi.org/10.1142/9789811204555_0001 Citation Details
Afzal, Mohammad Atif and Haghighatlari, Mojtaba and Ganesh, Sai Prasad and Cheng, Chong and Hachmann, Johannes "Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining" The Journal of Physical Chemistry C , v.123 , 2019 10.1021/acs.jpcc.9b01147 Citation Details
Afzal, Mohammad Atif and Sonpal, Aditya and Haghighatlari, Mojtaba and Schultz, Andrew J. and Hachmann, Johannes "A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules" Chemical Science , v.10 , 2019 10.1039/C9SC02677K Citation Details
Ferguson, Andrew and Hachmann, Johannes "Machine learning and data science in materials design: a themed collection" Molecular Systems Design & Engineering , v.3 , 2018 10.1039/C8ME90007H Citation Details
Ferguson, Andrew L. and Hachmann, Johannes and Miller, Thomas F. and Pfaendtner, Jim "The Journal of Physical Chemistry A / B / C Virtual Special Issue on Machine Learning in Physical Chemistry" The Journal of Physical Chemistry A , v.124 , 2020 https://doi.org/10.1021/acs.jpca.0c09205 Citation Details
Hachmann, Johannes and Afzal, Mohammad Atif and Haghighatlari, Mojtaba and Pal, Yudhajit "Building and deploying a cyberinfrastructure for the data-driven design of chemical systems and the exploration of chemical space" Molecular Simulation , v.44 , 2018 10.1080/08927022.2018.1471692 Citation Details
Haghighatlari, Mojtaba and Hachmann, Johannes "Advances of machine learning in molecular modeling and simulation" Current Opinion in Chemical Engineering , v.23 , 2019 10.1016/j.coche.2019.02.009 Citation Details
Haghighatlari, Mojtaba and Vishwakarma, Gaurav and Altarawy, Doaa and Subramanian, Ramachandran and Kota, Bhargava U. and Sonpal, Aditya and Setlur, Srirangaraj and Hachmann, Johannes "ChemML : A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data" WIREs Computational Molecular Science , v.10 , 2020 https://doi.org/10.1002/wcms.1458 Citation Details
Hanwell, Marcus D. and Harris, Chris and Genova, Alessandro and Haghighatlari, Mojtaba and El Khatib, Muammar and Avery, Patrick and Hachmann, Johannes and de Jong, Wibe Albert "Open Chemistry, JupyterLab , REST , and quantum chemistry" International Journal of Quantum Chemistry , v.121 , 2020 https://doi.org/10.1002/qua.26472 Citation Details
Vishwakarma, Gaurav and Sonpal, Aditya and Hachmann, Johannes "Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and Best Practices for Machine Learning in Chemistry" Trends in Chemistry , v.3 , 2021 https://doi.org/10.1016/j.trechm.2020.12.004 Citation Details
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Artificial intelligence (AI), machine learning (ML), and data science are playing an increasingly important role in chemical and materials research.

However, data-driven research is still beyond the reach of most chemists and materials scientists due to unresolved questions on how to make AI/ML work for chemical and materials questions, due to a lack of available and accessible software tools, and due to the fact that data science is typically not part of the formal training (and thus experience/skill-set) of this research community.

In this project, we addressed these issues and advanced solutions to help overcome them. We created new AI/ML methods and open-source software for use on chemical and materials problems, as well as training initiatives to prepare the workforce of the future. This work includes techniques to automate the use of AI/ML as far as possible, to make it easy and safe to use, to tailor it to the specific requirements of chemical and materials studies, and to open the AI/ML 'black-box' in order to learn what its predictions can tell us.  

We tested these new techniques and tools on a number of specific research questions (e.g., the design of new polymer materials for lenses and active compounds in energy storage devices) and demonstrated how they could accelerate the generation of new findings and provide additional insights. The AI/ML prediction models created in this context allow us to make predictions about the properties of new and previously unknown compounds at a fraction of the time that would be needed using traditional modeling or experimental approaches. In addition, we use data mining to gain a better understanding of the connections between the structures of molecular and materials compounds and their properties. This improved understanding allows us to pursue a more purposeful and targeted design of new compounds with desirable properties.

Many of these effort have led to collaborations and partnerships with other academic and industry researchers, which found our techniques and tools valuable.  

 


Last Modified: 04/18/2025
Modified by: Johannes Hachmann

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page