
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 13, 2013 |
Latest Amendment Date: | July 16, 2014 |
Award Number: | 1344201 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 15, 2013 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $699,986.00 |
Total Awarded Amount to Date: | $699,986.00 |
Funds Obligated to Date: |
FY 2014 = $209,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
341 PINE TREE RD ITHACA NY US 14850-2820 (607)255-5014 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5133 Upson Hall Ithaca NY US 14853-7501 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
OFFICE OF MULTIDISCIPLINARY AC, Information Technology Researc, SOLID STATE & MATERIALS CHEMIS, Info Integration & Informatics, INSPIRE |
Primary Program Source: |
01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This INSPIRE award is partially funded by the Information Integration and Informatics Program in the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Solid State and Materials Chemistry Program in the Division of Materials Research and the Office of Multidisciplinary Activities in the Directorate for Mathematical and Physical Sciences.
The past two decades have seen a rapid development in experimental high-throughput experimentation (HTE) methodologies that would be extremely valuable for (i) the discovery of new applied materials with high complexity and (ii) the generation of deep understanding of structure/function, structure/activity and structure/performance relationships. Especially high photon flux X-ray techniques have enormous transformative potential in materials discovery. The research team leverages the data being collected by the Cornell High Energy Synchrotron Source (CHESS) and at Caltechs Joint Center for Artificial Photosynthesis (JCAP). While high-throughput inorganic library synthesis is relatively well-established, high-throughput structure determination, which is at the heart of the proposed research, is in its infancy. X-ray diffraction is well-suited for rapidly collecting information on the atomic arrangements in an inorganic sample, but the data do not immediately reveal a crystal structure. The development of data analysis, data mining and interpretation methodologies has not kept pace with the development of experimental capability. Consequently, data acquired in a week can take many months of traditional analysis by researchers. Automation and machine-intelligent processing of the data are absolutely necessary to maximise the impact of complex multidimensional datasets.
This project addresses this state of affairs head-on; It investigates computational techniques that allow dealing with the multiparameter space associated with HTE structure determination of materials libraries, through constraint guided search adn optimization, statistical machine learning, and inference techniques in combination with direct human input into the process. Anticipated advances include new probabilistic methods and computational discovery tools that integrate soft and hard constraints that capture the complex background knowledge from the underlying physics and chemistry of materials with insights gained from high throughput data analytics and machine learning. If the project succeeds in achieving the anticipated enormous efficiency gains in complex structure determination, it could have have a transformative impact on materials discovery and complex solid state chemistry and physics.
The ability to reduce complex materials dicovery and optimization from timeframes of months or years to hours or days could lead to a paradigm shift in the development of products benefiting society, with technological advances as well as commercial impact on energy, sustainability, health and quality of life. The planned free dissemination of data sets and computational tools to the larger scientific community is likely to enhance the broader impacts of the project. The project facilitates increased interdisciplinary interactions between computer scientists and material scientists at Cornell University and offer enhanced opportunities for training of a new generation of researchers at the interface between the two disciplines.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Our vision is to develop new computational methods for dramatically accelerating the discovery of new materials, speeding up the discovery process by several orders of magnitude. More broadly, our Inspire project aims to show how a close cross-disciplinary collaboration involving materials scientists, materials chemists, computer scientists, and engineers can significantly advance and enrich the state-of the-art of each discipline.
In high-throughput materials discovery one searches for materials with new desirable properties by obtaining measurements on tens of thousands to millions of samples of multi-element systems. X-ray diffraction (XRD) is well-suited for rapidly collecting information on the atomic arrangements in an inorganic sample, but the data do not immediately reveal a crystal structure; indeed, it is not directly possible even to identify whether a sample comprises one, two, or more distinct crystalline phases. The data analysis and interpretation process is time-consuming and highly labor intensive. The grand challenge is therefore to develop efficient computational methods for data analysis and discovery, in order to gain insights from the deluge of materials science data, capturing and distilling structure and property relations of multicomponent materials. The problem of determining the combination of physical structures, as a function of chemical composition, is called the phase map identification problem.
The presence of significant background signal and experimental noise is one challenge in the analysis of XRD data. To address this, we developed and demonstrated the effectiveness of an approach in which materials are deposited on ultrathin amorphous membranes prepared on and supported by a crystalline silicon substrate, avoiding contributions to the signal from the substrate itself. An additional improvement to the data collection methodology, as compared with prior work, was to collect measurements as multiple detector images, allowing us to apply statistical methods to remove some noise and artifacts inherent in the experimental configuration.
At the beginning of the project, our state-of-the art analysis methodology for the phase map identification problem was based on integrating domain-specific scientific background knowledge about the physical and chemical properties of the materials into a satisfiability modulo theories (SMT) reasoning framework based on linear arithmetic and first-order logic. Though this method captured many of the physical characteristics of the problem, runtimes were long, especially in the cases of large data sets and experimental noise. Many combinatorial optimization problems admit backdoors, which are small subsets of the solution which, if known, can be used to efficiently deduce the remainder of the solution. We therefore explored a method for identifying solution backdoors for the phase map identification problem that exploits crowdsourcing and human computation. These backdoors were provided to our SMT-based solver as additional constraints.
We developed two human computing user interfaces to provide this information to our solvers. One was intended for crowdsourcing with non-expert users such as those available through Amazon Mechanical Turk (AMT). In this interface, visualizations of subsets of the data selected by the system are shown to the users, who identify diffraction peaks that are related to each other geometrically. Though the users do not need any background knowledge in the problem, they are identifying diffraction peaks that are discriminative in differentiating the parts of the signal originating from different crystal structures. In the interface intended for soliciting input from experts, the users are able to explore the full data set to identify those sets of samples in which the relationships between peaks are most evident, before identifying those patterns explicitly. These interfaces are provided through the UDiscoverIt website, along with source code, publications, background information, synthetic data generators, and data sets related to the project.
We also began to explore new approaches based on factor decomposition or nonnegative matrix factorization (NMF). In this model, the XRD signal of a particular sample are represented by a set of basis patterns, which are shared across the data set and represent the XRD pattern that would be produced by a particular pure crystalline powder, and their parameters including mixture weight and scaling. We developed a series of three methods implementing this approach, initially with a combinatorial method called CombiFD, then with more efficient gradient methods, AgileFD and interleaved agile factor decomposition (IAFD). These methods were increasingly efficient and effective at identifying solutions that are physically meaningful, and our latest solver typically runs in seconds or minutes, compared to hours for the prior methods. We developed a web-based interactive user interface for experts to explore data sets, solutions, and to run the solver called Phase-Mapper. This system and its solvers have been used by collaborating materials scientists to interpret several new materials systems, including one that led to the discovery of new solar light absorbers and the alloying-based tuning of an important property for their performance.
Last Modified: 12/13/2017
Modified by: Carla Gomes
Please report errors in award information by writing to: awardsearch@nsf.gov.