Skip to feedback

Award Abstract # 1742702
EAGER: Training Computers and Humans to Detect Misinformation by Combining Computational and Theoretical Analysis

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: August 14, 2017
Latest Amendment Date: June 27, 2018
Award Number: 1742702
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2017
End Date: August 31, 2020 (Estimated)
Total Intended Award Amount: $300,000.00
Total Awarded Amount to Date: $316,000.00
Funds Obligated to Date: FY 2017 = $300,000.00
FY 2018 = $16,000.00
History of Investigator:
  • Dongwon Lee (Principal Investigator)
  • S. Shyam Sundar (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
PA  US  16802-7000
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 065Z, 114Z, 7434, 7916, 8225, 9178, 9251
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Awareness of misinformation online is becoming an increasingly important issue, especially when information is presented in the format of a news story, because (a) people may over-trust content that looks like news and fail to critically evaluate it, and (b) such stories can be easily spread, amplifying the effect of misinformation. Using machine learning methods to analyze a large database of articles labeled as more or less likely to contain misinformation, along with theoretical analyses from the fields of communication, psychology, and information science, the project team will first characterize what distinguishes stories that are likely to contain misinformation from others. These characteristics will be used to build a tool that calls out characteristics of a given article that are known to correlate with misinformation; they will also be used to develop training materials to help people make these judgments. The tool and training materials will be tested through a series of experiments in which articles are evaluated by the tool and by people both before and after undergoing training. The goal is to have a positive impact on online discourse by improving both readers' and moderators' ability to reduce the impact of misinformation campaigns. The team will make the models, tools, and training materials publicly available for others to use in research, in classes, and online.

The team will use two main approaches to characterize articles that are more likely to contain misinformation. The first is a concept explication approach from the social sciences based on a deep analysis of research writing around information dissemination and evaluation. The second is a supervised machine learning approach to be trained on large datasets of labeled articles, including verified examples of misinformation. Both approaches will consider characteristics of the content; of its visual presentation; of the people who create, consume, and share it; and of the networks it moves through. These models will be translated into a set of weighted rules that combine the insights from the two approaches, then instantiated in Markov Logic Networks. These leverage the strengths of both first order logic and probabilistic graphic models, allow for a variety of efficient inference methods, and have been applied to a number of related problems; the models will be evaluated offline against test data using standard machine learning techniques. Finally, the team will develop training materials based on existing work from the International Federation of Library Associations and Institutions and on heuristic guidelines derived from the modeling work in the first two tasks, evaluate them through the experiments described earlier, and disseminate them online along with the developed models.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cui, Limeng and Lee, Dongwon "CoAID: COVID-19 Healthcare Misinformation Dataset" arXiv , 2020 https://doi.org/ Citation Details
Cui, Limeng and Seo, Haeseung and Tabar, Maryam and Ma, Fenglong and Wang, Suhang and Lee, Dongwon "DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation" 2020 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2020 https://doi.org/10.1145/3394486.3403092 Citation Details
Le, Thai and Wang, Suhang and Lee, Dongwon "GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction" 2020 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2020 https://doi.org/10.1145/3394486.3403066 Citation Details
Le, Thai and Wang, Suhang and Lee, Dongwon "MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models" 20th IEEE Int'l Conf. on Data Mining (ICDM) , 2020 https://doi.org/ Citation Details
Molina, Maria D. and Sundar, S. Shyam and Le, Thai and Lee, Dongwon "Fake News Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content" American Behavioral Scientist , 2019 https://doi.org/10.1177/0002764219878224 Citation Details
Uchendu, Adaku and Le, Thai and Shu, Kai and Lee, Dongwon "Authorship Attribution for Neural Text Generation" Conf. on Empirical Methods in Natural Language Processing (EMNLP) , 2020 https://doi.org/ Citation Details
Zhang, Jason (Jiasheng) and Lee, Dongwon "PROMO for Interpretable Personalized Social Emotion Mining" Joint European Conf. on Machine Learning and Principles & Practice of Knowledge Discovery in Databases (ECML-PKDD) , 2020 https://doi.org/ Citation Details
Zhang, Jason (Jiasheng) and Lee, Dongwon "TOMATO: A Topic-Wise Multi-Task Sparsity Model" 29th ACM Int'l Conf. on Information and Knowledge Management (CIKM) , 2020 https://doi.org/10.1145/3340531.3411972 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The outcomes of this EAGER project include: (1) the improved understanding on the wide spectrum and multiple sub-types of misinformation therein via a concept explication from social science perspective, (2) the identification and validation of computational and operational features of the identified sub-types of misinformation (e.g., clickbait, propaganda, native advertisement, fake news) across genres (e.g., politics, health), (3) the design and development of machine learning based solutions to detect sub-types of misinformation accurately, (4) the development of foundational techniques to be able to explain why a piece of information is true or fake using user comments or counter-examples, (5) the improved understanding on people’s susceptibility toward misinformation, and people’s ability to discriminate misinformation from true news, (6) the demonstration of a possibility for machine learning based detection models to get attacked by adversaries and forced to make wrong predictions on the veracity of news, and (7) suggestions on potential defense toward such attacks for machine learning based detection models.

The project has supported and trained three Ph.D. students (one of them graduated and joined Michigan State University as a faculty member), three REU students (two of them graduated and joined graduate schools at CMU and UT Austin, respectively), and three undergraduate students (all of them graduated and joined the industry). The project has also contributed to developing public benchmark dataset, FakeNewsNet, to evaluate and compare the performance of machine learning based misinformation detection, which has become the most widely-used dataset in the research community.


Last Modified: 02/03/2021
Modified by: Dongwon Lee

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page