Award Abstract # 1815358
III: Small: Collaborative Research: Explainable Natural Language Inference

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK
Initial Amendment Date: July 27, 2018
Latest Amendment Date: June 26, 2020
Award Number: 1815358
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $244,537.00
Total Awarded Amount to Date: $252,537.00
Funds Obligated to Date: FY 2018 = $244,537.00
FY 2020 = $8,000.00
History of Investigator:
  • Niranjan Balasubramanian (Principal Investigator)
    niranjan@cs.stonybrook.edu
Recipient Sponsored Research Office: SUNY at Stony Brook
W5510 FRANKS MELVILLE MEMORIAL LIBRARY
STONY BROOK
NY  US  11794
(631)632-9949
Sponsor Congressional District: 01
Primary Place of Performance: SUNY at Stony Brook
Department of Computer Science,
Stony Brook
NY  US  11794-6999
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): M746VC6XMNH9
Parent UEI: M746VC6XMNH9
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9251, 7923, 075Z, 7364
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Natural language inference (NLI) can support decision-making using information contained in natural language texts (e.g, detecting undiagnosed medical conditions in medical records, finding alternate treatments from scientific literature). This requires gathering facts extracted from text and reasoning over them. Current automated solutions for NLI are largely incapable of producing explanations for their inferences, but this capacity is essential for users to trust their reasoning in domains such as scientific discovery and medicine where the cost of making errors is high. This project develops natural language inference methods that are both accurate and explainable. They are accurate because they build on state-of-the-art deep learning frameworks which use powerful, automatically learned, representations of text. They are explainable because they aggregate information in units that can be represented in both a human readable explanation and a machine-usable vector representation. This project will advance methods in explainable natural language inference to enable the application of automated inference methods in critical domains such as medical knowledge extraction. The project will also evaluate the explainability of the inference decisions in collaboration with domain experts.

This project reframes natural language inference as the task of constructing and reasoning over explanations. In particular, inference assembles smaller component facts into a graph (explanation graph) that it reasons over to make decisions. In this view, generating explanations is an integral part of the inference process and not a separate post-hoc mechanism. The project has three main goals: (a) Develop multiagent reinforcement learning models that can effectively and efficiently explore the space of explanation graphs, (b) Develop deep learning based aggregation mechanisms that can prevent inference from combining semantically incompatible evidence, and (c) Build a continuum of hypergraph based text representations that combine discrete forms of structured knowledge with their continuous embedding based representations. The techniques will be evaluated on three application domains: complex question answering, medical relation extraction, and clinical event detection from medical records. The results of the project will be disseminated through the project website, scholarly venues, and the software and datasets will be made available to the public.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cao, Qingqing and Weber, Noah and Balasubramanian, Niranjan and Balasubramanian, Aruna "DeQA: On-Device Question Answering" ACM International Conference on Mobile Systems, Applications, and Services , v.1 , 2019 Citation Details
Inoue, Naoya and Trivedi, Harsh and Sinha, Steven and Balasubramanian, Niranjan and Inui, Kentaro "Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension" Conference on Empirical Methods in Natural Language Processing , 2021 https://doi.org/10.18653/v1/2021.emnlp-main.490 Citation Details
Trivedi, Harsh and Kwon, Heeyoung and Khot, Tushar and Sabharwal, Ashish and Balasubramanian, Niranjan "Repurposing Entailment for Multi-Hop Question Answering Tasks" North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies , v.1 , 2019 Citation Details
Yang, Xuewen and Liu, Yingru and Xie, Dongliang and Wang, Xin and Balasubramanian, Niranjan "Latent Part-of-Speech Sequences for Neural Machine Translation" Empirical Methods in Natural Language Processing , 2019 10.18653/v1/D19-1072 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Explainable natural language inference is critical for building Natural Language Processing systems that are both reliable and trustworthy. This research aimed at developing methods and datasets that can help build explainable NLP systems in information access applications such as  question answering and relation extraction. As an example, consider a question answering (QA) system that provides an answer to a question. If it is also able to provide the pieces of information that it used to arrive at its answer then a user can assess if the answer is correct. And when the system returns incorrect  answers, a deployer or developer of such a system can attempt to debug why the system failed thus providing a way to improve the system.

One of the main difficulties in developing explainable models is the lack of training  data -- i.e. examples of the input, output and desired explanations. While there were plenty of question and  answer pair datasets, obtaining explanations is much more laborious to create. A related difficulty this presents is that when we build systems that are only trained to only identify the correct answers, they tend to figure out any artifact or spurious correlation that might exist between question and answers in the datasets that they train on. This then results in models that are (A) unreliable -- i.e. ones that don’t do the correct reasoning we expect of them, and (B) not explainable -- i.e. ones that can only provide answers but no useful explanations. This project made multiple contributions in addressing this challenge which can be described in four main threads.

Thread one focused on building QA models that are structured in such a way that they are forced to take good intermediate steps i.e. finding useful sentences that contain necessary information for arriving at the final answer. This naturally lends itself to building models that can provide high quality explanations. We showed how existing ideas in verifying entailment (figuring out if one piece of text supports information in another) can be used to assemble a QA system of this kind.

Thread two focused on minimizing unreliable reasoning by formalizing one type of bad reasoning in QA models and developing new datasets to discourage this behavior. When there is no supervision for intermediate steps models can latch on to any artifact in the data and perform what we call disconnected reasoning. If we can somehow catch when models are not identifying or using all the information they are supposed to use when answering a question then we can address this problem. We developed ways to transform existing datasets to detect and discourage disconnected reasoning.

Another important contribution we make here is to show how to construct reliable and explainable QA datasets. A multihop question, i.e. one which requires multiple pieces of information, can be seen as composed of multiple single-hop questions. This naturally provides a way to construct (and filter) a large collection of multihop questions by connecting single-hop questions where the answer of one question is part of another question. More importantly, this gives us questioons with identified sub-steps, thus providing examples for models to learn to do the intermediate steps and provide supervision for explanations. 

Thread three looked at pushing the scope for what constitutes an explanation for multihop question answering. Previous datasets mostly identified existing spans of text within the inputs to the QA system as explanations. However, this is neither adequate nor concise.  A human providing an explanation would summarize the relevant information and not simply read out the relevant sentences. Again we need examples of such explanations at scale which is expensive and time consuming to obtain. We show that we can make use of existing ideas in abstractive summarization to provide useful compressed summaries which can then be used by a QA model. Further, we introduce a reinforcement  learning framework where we turn notions of explanation quality into rewards to improve the quality of the models ability to find most useful information for answering a question.

Thread four looked at explainable biomedical relation extraction. The main contribution is in formalizing a new task forcing models to learn to not only identify relations between biomedical entities, but also to figure out what is the biomedical mechanism that connects and justifies the inferred relation. The key challenge is the lack of large scale datasets. We show how relatively small amounts of domain expert time can be used to first identify examples of these biomedical mechanisms and then use this to bootstrap creation of a large weakly labeled dataset for this task. 

Overall this project made useful algorithmic advances in terms of methods for building explainable NLP systems and resulted in resources via large scale datasets for question answering and relation extraction which we hope will further advances in these areas.

 

 

 


Last Modified: 12/30/2021
Modified by: Niranjan Balasubramanian

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page