Award Abstract # 1948322
CRII: III: Capturing Dynamism in Causal Relationships: A New Paradigm for Relationship Extraction from Text

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date: May 14, 2020
Latest Amendment Date: June 21, 2021
Award Number: 1948322
Award Instrument: Standard Grant
Program Manager: Hector Munoz-Avila
hmunoz@nsf.gov
 (703)292-4481
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 15, 2020
End Date: April 30, 2023 (Estimated)
Total Intended Award Amount: $174,332.00
Total Awarded Amount to Date: $190,332.00
Funds Obligated to Date: FY 2020 = $174,332.00
FY 2021 = $16,000.00
History of Investigator:
  • Sunandan Chakraborty (Principal Investigator)
    sunchak@iu.edu
Recipient Sponsored Research Office: Indiana University
107 S INDIANA AVE
BLOOMINGTON
IN  US  47405-7000
(317)278-3473
Sponsor Congressional District: 09
Primary Place of Performance: Indiana University
535 W Michigan St., IT475
Indianapolis
IN  US  46202-6151
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): YH86RTW2YVJ4
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 8228, 9251
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Text mining made important advances in methods to convert vast and unstructured text data into knowledge. However, the current paradigm of relationship extraction has one major limitation: it models snapshots of information but fails to capture the fundamentally dialogic and dynamic nature of knowledge: conflicting findings, inconsistent discoveries, refutations, contradictions, reinforcements or confirmations, all changing over time. This project aims to capture such fundamental dynamics of knowledge, specifically focusing on causal relationships. Whereas numerous articles, including academic articles, present knowledge and relationships that express causality, such relationships are not static and can change over time due to changing conditions. The objective of this project is to identify cues of causal knowledge from text data, quantify the strength of the causal relationship, and model its dynamics over changing conditions. Ultimately, the project aims at modelling a more holistic view of the knowledge extracted from text. As text data is extensively used by researchers and practitioners from different domains of national importance, including, medicine and health, economics, public policy, journalism, the results of this project seek to provide the foundation to offer practitioners new ways to understand the evolving nature of the causal relationships present in large text datasets. Specifically, the novel approaches developed in the project will be applied to explore public health data to determine how changing climatic, political, economic conditions may affect the mental and physical health of the population in different geographic areas. In addition, there will be various educational activities as part of this project - emerging and related topics from this project will be included in the curricula of various courses in the applied data science master?s program; promote undergraduate research, specifically, recruit students to work in the project who are from underrepresented and economically disadvantaged communities; organize a research workshop to encourage participation of high school students in STEM research.

The project activities include the development of a novel model of causal relationship extraction that leverages a unified deep learning framework combining both semantic and syntax cues. This approach will utilize the key syntactical features of a sentence represented by the grammar relationships between noun, verbs and other parts of speech through graphical or tree-like models. This work will determine whether the sentence features a structure that signals causality. Moreover, the sequential component of the model will utilize the semantics and identify the influence of certain words in the sentence to characterize the nature of the causal relationship expressed in the text. This task will capture the strength of the relationship (e.g., using cues like "extremely likely", "definitely"), any supporting or opposing evidences (e.g., "will lead to" or "does not lead to"), and will identify conditional cues (e.g., "in the presence of") etc. Quantifying such qualitative properties will lead to the second innovation of this project ? causal distance. Causal distance is a time-variant metric that will denote the magnitude of causality between two entities as well as capture the dynamism of the relationship by modifying itself over time with changing conditions or new evidences. Collectively, the advances pursued in this projects will further enhance our understanding of the novel computational approaches needed to unearth and reason on cues of causal relationships embedded in large text data sets. The outcomes of this project, such as datasets, source code, final software, results and publications will be shared via publicly accessible URLs and online code repositories. Additionally, all the project resources and outcomes will be made available on the project website.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gujarathi, Pranav and Reddy, Manohar and Tayade, Neha and Chakraborty, Sunandan "A Study of Extracting Causal Relationships from Text" Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 544. Springer, Cham. , v.544 , 2022 https://doi.org/10.1007/978-3-031-16075-2_59 Citation Details
Gujarathi, Pranav and VanSchaik, Jack T. and Mani Babu Karri, Venkata and Rajapuri, Anushri and Cheriyan, Biju and Thyvalikakath, Thankam P. and Chakraborty, Sunandan "Mining Latent Disease Factors from Medical Literature using Causality" 2022 IEEE International Conference on Big Data (Big Data) , 2022 https://doi.org/10.1109/BigData55660.2022.10020994 Citation Details
Gujarathi, Pranav Dhananjay and Reddy, Sai Krishna and Karri, Venkata Mani and Bhimireddy, Ananth Reddy and Rajapuri, Anushri Singh and Reddy, Manohar and Sabbani, Mounika and Cheriyan, Biju and VanSchaik, Jack and Thyvalikakath, Thankam and Chakraborty, "Note: Using Causality to Mine Sjögrens Syndrome related Factors from Medical Literature" COMPASS '22: Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies , 2022 https://doi.org/10.1145/3530190.3534850 Citation Details
VanSchaik, Jack T. and Jain, Palak and Rajapuri, Anushri and Cheriyan, Biju and Thyvalikakath, Thankam P. and Chakraborty, Sunandan "Using transfer learning-based causality extraction to mine latent factors for Sjögren's syndrome from biomedical literature" Heliyon , 2023 https://doi.org/10.1016/j.heliyon.2023.e19265 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this project, we sought to find novel ways of extracting causal knowledge from text. Our aim was to have a deeper understanding of how causality is expressed and embedded in the text, which will help to extract such knowledge. In addition, our goal was to capture the dynamism exhibited by these relationships.

The extraction models developed as part of this project have demonstrated two important accomplishments: first, they exhibit improved performance compared to existing methods on established benchmark datasets; second, they introduce unsupervised approaches that can analyze natural language text without the need for annotated datasets, expanding their applicability. To capture the dynamism, we introduced Pointwise Causal Information metric, which offers a continuous real-valued measurement of causal relationship strength. This metric quantifies causality with greater nuance and precision than previous binary classifications. Our efforts extend to a new annotation scheme, producing more comprehensive datasets that facilitate the extraction of complex relationships. These enriched datasets address intricate causal relationships, including contradiction, conditional, temporal, transitive, and triangular causality. Our exploration into Large Language Models (LLMs) has uncovered insights into how causality is embedded within their parameters. Our evaluation of LLMs has yielded valuable findings about their causal knowledge and understanding capabilities. Leveraging this knowledge, we have developed a transfer learning-based model capable of extracting causality without extensive training on specific data sources. This transfer learning technique has enabled the models, trained on a diverse set of causal sentences, to be used on an out-of-domain dataset with minimal supervision. This model's application in the field of biomedical literature has led to significant results, detecting latent factors of diseases such as symptoms, risk factors, and associated conditions. Our contributions extend beyond technical innovations. We have generated a new dataset to foster further advancements in causality extraction from biomedical literature.

By refining causality detection and quantification, we are advancing NLP applications involving sequence tagging and semantic relationships. Our novel frameworks and approaches are poised to elevate the state-of-the-art in causality detection, addressing previously overlooked scenarios. They are likely to improve the applicability of causality in different domains. The real-valued Pointwise Causal Information metric will strengthen causality applications in domains such as medicine, public health, and economics.

The significance of our work is recognized through four peer-reviewed publications and presentations at prestigious research venues. Regarding mentorship and collaboration, we have supported and funded numerous researchers at different academic levels, including two doctoral students, four graduate research assistants, two REU undergraduate researchers, and three high school participants.

In summary, our collective efforts have made remarkable strides in understanding, detecting, and applying causal relationships. These contributions transcend technical domains, impacting fields as diverse as medicine, public health, economics, and social sciences.


Last Modified: 08/30/2023
Modified by: Sunandan Chakraborty

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page