
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | July 27, 2018 |
Latest Amendment Date: | June 22, 2020 |
Award Number: | 1815948 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2018 |
End Date: | August 31, 2023 (Estimated) |
Total Intended Award Amount: | $254,463.00 |
Total Awarded Amount to Date: | $262,463.00 |
Funds Obligated to Date: |
FY 2020 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
845 N PARK AVE RM 538 TUCSON AZ US 85721 (520)626-6000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
AZ US 85721-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Natural language inference (NLI) can support decision-making using information contained in natural language texts (e.g, detecting undiagnosed medical conditions in medical records, finding alternate treatments from scientific literature). This requires gathering facts extracted from text and reasoning over them. Current automated solutions for NLI are largely incapable of producing explanations for their inferences, but this capacity is essential for users to trust their reasoning in domains such as scientific discovery and medicine where the cost of making errors is high. This project develops natural language inference methods that are both accurate and explainable. They are accurate because they build on state-of-the-art deep learning frameworks which use powerful, automatically learned, representations of text. They are explainable because they aggregate information in units that can be represented in both a human readable explanation and a machine-usable vector representation. This project will advance methods in explainable natural language inference to enable the application of automated inference methods in critical domains such as medical knowledge extraction. The project will also evaluate the explainability of the inference decisions in collaboration with domain experts.
This project reframes natural language inference as the task of constructing and reasoning over explanations. In particular, inference assembles smaller component facts into a graph (explanation graph) that it reasons over to make decisions. In this view, generating explanations is an integral part of the inference process and not a separate post-hoc mechanism. The project has three main goals: (a) Develop multiagent reinforcement learning models that can effectively and efficiently explore the space of explanation graphs, (b) Develop deep learning based aggregation mechanisms that can prevent inference from combining semantically incompatible evidence, and (c) Build a continuum of hypergraph based text representations that combine discrete forms of structured knowledge with their continuous embedding based representations. The techniques will be evaluated on three application domains: complex question answering, medical relation extraction, and clinical event detection from medical records. The results of the project will be disseminated through the project website, scholarly venues, and the software and datasets will be made available to the public.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project (Explainable Natural Language Inference) broadly aimed to improve the capabilities of artificial intelligence systems to correctly answer questions posed by humans while also producing explanations for why the model believes its answers are correct. This ability to generate explanations for its reasoning vastly improves the usefulness of such a system.
The work on this award made scientific contributions in designing algorithms that answer questions and build explanations, building data to help AI models learn to build explanations automatically, designing new methods of representing explanations that make inference easier, and designing new ways of measuring a system's ability to perform reasoning. More specifically:
Designing new algorithms that build explanations:
This award constructed new methods and algorithms for building explanations to questions, centrally by combining multiple smaller facts into larger explanatory wholes. One of the main topic areas studied was in elementary and middle-school level scientific reasoning, where a student might need to combine multiple facts (such as that water has a boiling point of 100 degrees Celsius, that a metal pot is a good thermal conductor, that a stove is a source of heat, and that objects that are heated increase temperature) to answer a question about why water boiled in a pot on a stove. This project examined the ability of a large number of existing and modified algorithms related to deep learning, symbolic learning, reinforcement learning, and other kinds of learning, for their ability to perform this kind of explanatory inference.
Building data to help AI models learn to build better explanations:
Contemporary artificial intelligence systems typically work by "training" a model to perform a task on a large number of high-quality examples of that task, and then further evaluate the ability of that model to correctly perform the task on a different set of "testing" data that was hidden during training. Prior to this award, essentially no significant data existed for training or evaluating systems to build large explanations by combining facts together. During this award, a large amount of data for training and evaluating artificial reasoning and explanation construction was produced. Similarly, while counter-intuitive, machines don't often learn best in text, but frequently prefer having some very clear and logical "structure" to their data that helps make the task easier to perform. We developed a new formalism called "Entailment Trees" that help represent explanations in a structure that resembles a tree. This kind of representation makes it easier for machines to learn to perform reasoning because it breaks down large reasoning problems into smaller steps and makes it very explicit what facts are required to solve each step, and exactly what the model should infer from those facts. Similarly, it's sometimes very hard to know how well our AI systems perform at some task because they might be tested on very large amounts of data (for example, thousands of questions every day), which is too large or expensive for humans to manually rate. Because of this, scientists typically develop methods of automatically performing evaluation which — while not as good as human evaluation — help tell us enough about the performance of the system on average that they're useful for monitoring day-to-day progress. In this work, we showed that some existing methods for this automatic evaluation are underestimating the performance of some systems to generate explanations, and developed new methods of evaluation that experiments show are better.
Designing new ways of measuring a system's ability to perform reasoning:
One of the surprising results of this work is that, while the systems that we and others developed rapidly increased their ability to correctly answer questions (and build explanations) over the years of this award, that knowledge and reasoning tended to be very brittle. For example, large language models eventually were able to score near an "A" grade on elementary science exams — a major scientific achievement — suggesting that AI systems understood science at the level of a 5th grader. However, we showed, by building a virtual environment (similar to a game) that AI models can interact with, that simulated much of the same content that's on science exams (like how to boil water, build simple electronic circuits, or understand basic properties of genetics), that AI systems that were able to score extremely well on written tests were unable to demonstrate that same knowledge when tested in a different way — interactively — suggesting that the AI models knew much less than we originally thought. This has spawned a great deal of work in building interactive virtual environments that help train and evaluate the multi-step reasoning capabilities of language models, so that they can begin to perform detailed multi-step processes that are helpful and relevant for tasks that are useful for humans.
Last Modified: 07/27/2024
Modified by: Peter A Jansen
Please report errors in award information by writing to: awardsearch@nsf.gov.