
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | July 27, 2018 |
Latest Amendment Date: | June 26, 2020 |
Award Number: | 1815358 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2018 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $244,537.00 |
Total Awarded Amount to Date: | $252,537.00 |
Funds Obligated to Date: |
FY 2020 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
W5510 FRANKS MELVILLE MEMORIAL LIBRARY STONY BROOK NY US 11794 (631)632-9949 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Department of Computer Science, Stony Brook NY US 11794-6999 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Natural language inference (NLI) can support decision-making using information contained in natural language texts (e.g, detecting undiagnosed medical conditions in medical records, finding alternate treatments from scientific literature). This requires gathering facts extracted from text and reasoning over them. Current automated solutions for NLI are largely incapable of producing explanations for their inferences, but this capacity is essential for users to trust their reasoning in domains such as scientific discovery and medicine where the cost of making errors is high. This project develops natural language inference methods that are both accurate and explainable. They are accurate because they build on state-of-the-art deep learning frameworks which use powerful, automatically learned, representations of text. They are explainable because they aggregate information in units that can be represented in both a human readable explanation and a machine-usable vector representation. This project will advance methods in explainable natural language inference to enable the application of automated inference methods in critical domains such as medical knowledge extraction. The project will also evaluate the explainability of the inference decisions in collaboration with domain experts.
This project reframes natural language inference as the task of constructing and reasoning over explanations. In particular, inference assembles smaller component facts into a graph (explanation graph) that it reasons over to make decisions. In this view, generating explanations is an integral part of the inference process and not a separate post-hoc mechanism. The project has three main goals: (a) Develop multiagent reinforcement learning models that can effectively and efficiently explore the space of explanation graphs, (b) Develop deep learning based aggregation mechanisms that can prevent inference from combining semantically incompatible evidence, and (c) Build a continuum of hypergraph based text representations that combine discrete forms of structured knowledge with their continuous embedding based representations. The techniques will be evaluated on three application domains: complex question answering, medical relation extraction, and clinical event detection from medical records. The results of the project will be disseminated through the project website, scholarly venues, and the software and datasets will be made available to the public.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Explainable natural language inference is critical for building Natural Language Processing systems that are both reliable and trustworthy. This research aimed at developing methods and datasets that can help build explainable NLP systems in information access applications such as question answering and relation extraction. As an example, consider a question answering (QA) system that provides an answer to a question. If it is also able to provide the pieces of information that it used to arrive at its answer then a user can assess if the answer is correct. And when the system returns incorrect answers, a deployer or developer of such a system can attempt to debug why the system failed thus providing a way to improve the system.
One of the main difficulties in developing explainable models is the lack of training data -- i.e. examples of the input, output and desired explanations. While there were plenty of question and answer pair datasets, obtaining explanations is much more laborious to create. A related difficulty this presents is that when we build systems that are only trained to only identify the correct answers, they tend to figure out any artifact or spurious correlation that might exist between question and answers in the datasets that they train on. This then results in models that are (A) unreliable -- i.e. ones that don’t do the correct reasoning we expect of them, and (B) not explainable -- i.e. ones that can only provide answers but no useful explanations. This project made multiple contributions in addressing this challenge which can be described in four main threads.
Thread one focused on building QA models that are structured in such a way that they are forced to take good intermediate steps i.e. finding useful sentences that contain necessary information for arriving at the final answer. This naturally lends itself to building models that can provide high quality explanations. We showed how existing ideas in verifying entailment (figuring out if one piece of text supports information in another) can be used to assemble a QA system of this kind.
Thread two focused on minimizing unreliable reasoning by formalizing one type of bad reasoning in QA models and developing new datasets to discourage this behavior. When there is no supervision for intermediate steps models can latch on to any artifact in the data and perform what we call disconnected reasoning. If we can somehow catch when models are not identifying or using all the information they are supposed to use when answering a question then we can address this problem. We developed ways to transform existing datasets to detect and discourage disconnected reasoning.
Another important contribution we make here is to show how to construct reliable and explainable QA datasets. A multihop question, i.e. one which requires multiple pieces of information, can be seen as composed of multiple single-hop questions. This naturally provides a way to construct (and filter) a large collection of multihop questions by connecting single-hop questions where the answer of one question is part of another question. More importantly, this gives us questioons with identified sub-steps, thus providing examples for models to learn to do the intermediate steps and provide supervision for explanations.
Thread three looked at pushing the scope for what constitutes an explanation for multihop question answering. Previous datasets mostly identified existing spans of text within the inputs to the QA system as explanations. However, this is neither adequate nor concise. A human providing an explanation would summarize the relevant information and not simply read out the relevant sentences. Again we need examples of such explanations at scale which is expensive and time consuming to obtain. We show that we can make use of existing ideas in abstractive summarization to provide useful compressed summaries which can then be used by a QA model. Further, we introduce a reinforcement learning framework where we turn notions of explanation quality into rewards to improve the quality of the models ability to find most useful information for answering a question.
Thread four looked at explainable biomedical relation extraction. The main contribution is in formalizing a new task forcing models to learn to not only identify relations between biomedical entities, but also to figure out what is the biomedical mechanism that connects and justifies the inferred relation. The key challenge is the lack of large scale datasets. We show how relatively small amounts of domain expert time can be used to first identify examples of these biomedical mechanisms and then use this to bootstrap creation of a large weakly labeled dataset for this task.
Overall this project made useful algorithmic advances in terms of methods for building explainable NLP systems and resulted in resources via large scale datasets for question answering and relation extraction which we hope will further advances in these areas.
Last Modified: 12/30/2021
Modified by: Niranjan Balasubramanian
Please report errors in award information by writing to: awardsearch@nsf.gov.