Award Abstract # 1344201
INSPIRE Track 1: UDiscoverIt: Integrating Expert Knowledge, Constraint-Based Reasoning and Learning to Accelerate Materials Discovery

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CORNELL UNIVERSITY
Initial Amendment Date: September 13, 2013
Latest Amendment Date: July 16, 2014
Award Number: 1344201
Award Instrument: Continuing Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 15, 2013
End Date: August 31, 2017 (Estimated)
Total Intended Award Amount: $699,986.00
Total Awarded Amount to Date: $699,986.00
Funds Obligated to Date: FY 2013 = $490,986.00
FY 2014 = $209,000.00
History of Investigator:
  • Carla Gomes (Principal Investigator)
    gomes@cs.cornell.edu
  • Francis DiSalvo (Co-Principal Investigator)
  • Bart Selman (Co-Principal Investigator)
  • Robert van Dover (Co-Principal Investigator)
Recipient Sponsored Research Office: Cornell University
341 PINE TREE RD
ITHACA
NY  US  14850-2820
(607)255-5014
Sponsor Congressional District: 19
Primary Place of Performance: Cornell University
5133 Upson Hall
Ithaca
NY  US  14853-7501
Primary Place of Performance
Congressional District:
19
Unique Entity Identifier (UEI): G56PUALJ3KT5
Parent UEI:
NSF Program(s): OFFICE OF MULTIDISCIPLINARY AC,
Information Technology Researc,
SOLID STATE & MATERIALS CHEMIS,
Info Integration & Informatics,
INSPIRE
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1640, 7364, 8653
Program Element Code(s): 125300, 164000, 176200, 736400, 807800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This INSPIRE award is partially funded by the Information Integration and Informatics Program in the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Solid State and Materials Chemistry Program in the Division of Materials Research and the Office of Multidisciplinary Activities in the Directorate for Mathematical and Physical Sciences.

The past two decades have seen a rapid development in experimental high-throughput experimentation (HTE) methodologies that would be extremely valuable for (i) the discovery of new applied materials with high complexity and (ii) the generation of deep understanding of structure/function, structure/activity and structure/performance relationships. Especially high photon flux X-ray techniques have enormous transformative potential in materials discovery. The research team leverages the data being collected by the Cornell High Energy Synchrotron Source (CHESS) and at Caltechs Joint Center for Artificial Photosynthesis (JCAP). While high-throughput inorganic library synthesis is relatively well-established, high-throughput structure determination, which is at the heart of the proposed research, is in its infancy. X-ray diffraction is well-suited for rapidly collecting information on the atomic arrangements in an inorganic sample, but the data do not immediately reveal a crystal structure. The development of data analysis, data mining and interpretation methodologies has not kept pace with the development of experimental capability. Consequently, data acquired in a week can take many months of traditional analysis by researchers. Automation and machine-intelligent processing of the data are absolutely necessary to maximise the impact of complex multidimensional datasets.

This project addresses this state of affairs head-on; It investigates computational techniques that allow dealing with the multiparameter space associated with HTE structure determination of materials libraries, through constraint guided search adn optimization, statistical machine learning, and inference techniques in combination with direct human input into the process. Anticipated advances include new probabilistic methods and computational discovery tools that integrate soft and hard constraints that capture the complex background knowledge from the underlying physics and chemistry of materials with insights gained from high throughput data analytics and machine learning. If the project succeeds in achieving the anticipated enormous efficiency gains in complex structure determination, it could have have a transformative impact on materials discovery and complex solid state chemistry and physics.

The ability to reduce complex materials dicovery and optimization from timeframes of months or years to hours or days could lead to a paradigm shift in the development of products benefiting society, with technological advances as well as commercial impact on energy, sustainability, health and quality of life. The planned free dissemination of data sets and computational tools to the larger scientific community is likely to enhance the broader impacts of the project. The project facilitates increased interdisciplinary interactions between computer scientists and material scientists at Cornell University and offer enhanced opportunities for training of a new generation of researchers at the interface between the two disciplines.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bai, Junwen; Bjorck, Johan; Xue, Yexiang; Suram, Santosh K.; Gregoire, John; Gomes, Carla "Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery" Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR) , 2017 , p.104 10.1007/978-3-319-59776-8_9
Ronan Le Bras, Richard Bernstein, John M. Gregoire Santosh K. Suram, Carla P. Gomes, Bart Selman, R. Bruce van Dover "Challenge in Materials Discovery - Synthetic Generator and Real Dataset" AAAI , 2014 , p.438
Stefano Ermon, Ronan Le Bras, Santosh Suram, John M. Gregoire, Carla Gomes, Bart Selman, and Robert B. van Dover "Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery" AAAI , 2015 , p.636
Suram, Santosh K.; Xue, Yexiang; Bai, Junwen; Le Bras, Ronan; Rappazzo, Brendan; Bernstein, Richard; Bjorck, Johan; Zhou, Lan; van Dover, R. Bruce; Gomes, Carla P.; Gregoire, John M. "Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V?Mn?Nb Oxide System" ACS Combinatorial Science , v.19 , 2017 , p.37 10.1021/acscombsci.6b00153
Xue, Yexiang; Bai, Junwen; Le Bras, Ronan; Rappazzo, Brendan; Bernstein, Richard; Bjorck, Johan; Longpre, Liane; Suram, Santosh K.; van Dover, Robert B.; Gregoire, John; Gomes, Carla P. "Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery" Twenty-Ninth IAAI Conference , 2017 , p.4635
Yexiang Xue, Stefano Ermon, Carla Gomes, Bart Selman "Uncovering Hidden Structure through Parallel Problem Decomposition" AAAI , 2014 , p.3144
Yexiang Xue, Stefano Ermon, Ronan Le Bras, Carla P. Gomes and Bart Selman "Variable Elimination in the Fourier Domain" ICML , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Our vision is to develop new computational methods for dramatically accelerating the discovery of new materials, speeding up the discovery process by several orders of magnitude. More broadly, our Inspire project aims to show how a close cross-disciplinary collaboration involving materials scientists, materials chemists, computer scientists, and engineers can significantly advance and enrich the state-of the-art of each discipline.

In high-throughput materials discovery one searches for materials with new desirable properties by obtaining measurements on tens of thousands to millions of samples of multi-element systems. X-ray diffraction (XRD) is well-suited for rapidly collecting information on the atomic arrangements in an inorganic sample, but the data do not immediately reveal a crystal structure; indeed, it is not directly possible even to identify whether a sample comprises one, two, or more distinct crystalline phases. The data analysis and interpretation process is time-consuming and highly labor intensive. The grand challenge is therefore to develop efficient computational methods for data analysis and discovery, in order to gain insights from the deluge of materials science data, capturing and distilling structure and property relations of multicomponent materials. The problem of determining the combination of physical structures, as a function of chemical composition, is called the phase map identification problem.

The presence of significant background signal and experimental noise is one challenge in the analysis of XRD data. To address this, we developed and demonstrated the effectiveness of an approach in which materials are deposited on ultrathin amorphous membranes prepared on and supported by a crystalline silicon substrate, avoiding contributions to the signal from the substrate itself. An additional improvement to the data collection methodology, as compared with prior work, was to collect measurements as multiple detector images, allowing us to apply statistical methods to remove some noise and artifacts inherent in the experimental configuration.

At the beginning of the project, our state-of-the art analysis methodology for the phase map identification problem was based on integrating domain-specific scientific background knowledge about the physical and chemical properties of the materials into a satisfiability modulo theories (SMT) reasoning framework based on linear arithmetic and first-order logic. Though this method captured many of the physical characteristics of the problem, runtimes were long, especially in the cases of large data sets and experimental noise. Many combinatorial optimization problems admit backdoors, which are small subsets of the solution which, if known, can be used to efficiently deduce the remainder of the solution. We therefore explored a method for identifying solution backdoors for the phase map identification problem that exploits crowdsourcing and human computation. These backdoors were provided to our SMT-based solver as additional constraints.

We developed two human computing user interfaces to provide this information to our solvers. One was intended for crowdsourcing with non-expert users such as those available through Amazon Mechanical Turk (AMT). In this interface, visualizations of subsets of the data selected by the system are shown to the users, who identify diffraction peaks that are related to each other geometrically. Though the users do not need any background knowledge in the problem, they are identifying diffraction peaks that are discriminative in differentiating the parts of the signal originating from different crystal structures. In the interface intended for soliciting input from experts, the users are able to explore the full data set to identify those sets of samples in which the relationships between peaks are most evident, before identifying those patterns explicitly. These interfaces are provided through the UDiscoverIt website, along with source code, publications, background information, synthetic data generators, and data sets related to the project.

We also began to explore new approaches based on factor decomposition or nonnegative matrix factorization (NMF). In this model, the XRD signal of a particular sample are represented by a set of basis patterns, which are shared across the data set and represent the XRD pattern that would be produced by a particular pure crystalline powder, and their parameters including mixture weight and scaling. We developed a series of three methods implementing this approach, initially with a combinatorial method called CombiFD, then with more efficient gradient methods, AgileFD and interleaved agile factor decomposition (IAFD). These methods were increasingly efficient and effective at identifying solutions that are physically meaningful, and our latest solver typically runs in seconds or minutes, compared to hours for the prior methods. We developed a web-based interactive user interface for experts to explore data sets, solutions, and to run the solver called Phase-Mapper. This system and its solvers have been used by collaborating materials scientists to interpret several new materials systems, including one that led to the discovery of new solar light absorbers and the alloying-based tuning of an important property for their performance.


Last Modified: 12/13/2017
Modified by: Carla Gomes

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page