Skip to feedback

Award Abstract # 1717965
CHS: Small: Collaborative Research: Measuring and Promoting the Quality of Online News Discussions

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: OHIO STATE UNIVERSITY, THE
Initial Amendment Date: August 4, 2017
Latest Amendment Date: July 27, 2021
Award Number: 1717965
Award Instrument: Standard Grant
Program Manager: William Bainbridge
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 15, 2017
End Date: July 31, 2022 (Estimated)
Total Intended Award Amount: $51,258.00
Total Awarded Amount to Date: $61,248.00
Funds Obligated to Date: FY 2017 = $51,258.00
FY 2018 = $9,990.00
History of Investigator:
  • Robert Garrett (Principal Investigator)
    garrett.258@osu.edu
Recipient Sponsored Research Office: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
(614)688-8735
Sponsor Congressional District: 03
Primary Place of Performance: Ohio State University
OH  US  43210-1016
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): DLWBSLWAJWR1
Parent UEI: MN4MDDMN8529
NSF Program(s): HCC-Human-Centered Computing
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7367, 7923, 9251
Program Element Code(s): 736700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project will amplify the efforts of people to bring out the best in other people in online conversations, and will make it easier for people to find high quality online conversations. There are numerous concerns about the tone and content of online conversations on public affairs at the present time. At its best, everyday online debate can lead people to consider alternative perspectives and even change their minds. This happens in environments where people may disagree, but where they try to inform and convince each other rather than simply yell at each other. The first goal of the research is to create automated classifiers to measure the quality of everyday online political talk. Classifiers will estimate the quality of online conversations about news articles in public venues such as Twitter, Facebook, Reddit, and the comments sections of news pages. A Conversation Finder tool (a website and a browser extension) will use the automated classifiers to recommend, in real time, venues where particular news articles are being discussed and where the quality scores are high. The second goal of the research is to create a Conversation Coach that helps the general public to improve the quality of conversation spaces they participate in, by helping them craft messages that directly contribute to quality and that indirectly inspire others. It will include a Message Assistant that extracts elements from conversations in order to help people craft messages and a Message Impact Assessor that predicts the likely impact of a draft message on the quality metrics for subsequent conversations.

Quality of online conversations will be measured in terms of a variety of dimensions that communication scholars have articulated as desirable. Training data for the classifiers will be collected from conversation participants in addition to trained coders, and experiments will be conducted to determine the most effective sequence of requests to make of conversation participants in order to maximize motivation to contribute. Creation of the Conversation Recommender will lead to several intellectual contributions, including: (1) developing computational assists that help human raters achieve high inter-rater reliability; (2) identifying methods to motivate conversation participants to act as raters; (3) architecting neural-network based classifiers that achieve high prediction accuracy when trained using the collected ratings as training data; (4) developing techniques to make the classifiers produce interpretable results (explanations). Creation of the Conversation Coach will lead to two intellectual contributions: (1) identifying parts of conversations that can be automatically extracted and that writers find relevant and useful when composing messages; (2) architecting a predictive model that accurately estimates the impact of messages on subsequent conversation quality.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Budak, Ceren and Garrett, R. Kelly and Resnick, Paul and Kamin, Julia "Threading is Sticky: How Threaded Conversations Promote Comment System User Retention" Proceedings of the ACM on Human-Computer Interaction , v.1 , 2017 10.1145/3134662 Citation Details
Budak, Ceren and Garrett, R. Kelly and Sude, Daniel "Better Crowdcoding: Strategies for Promoting Accuracy in Crowdsourced Content Analysis" Communication Methods and Measures , v.15 , 2021 https://doi.org/10.1080/19312458.2021.1895977 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project was devoted to creating processes for assessing and improving the quality of online conversations.

 

  1. We developed and tested a technique for creating “explanations” for the outputs of automated neural net classifiers that assess where comments contain personal attacks. The explanation consists of highlighting a set of words in the message. In contrast to prior attention mechanisms, it highlights all the phrases rather than a minimal phrase that would be sufficient to conclude the presence of a personal attack. It does so through an "adversarial" training mechanism. The word selector is optimized to enable the classifier to a good job using only the selected words *and* a bad job using only the unselected words

  2. We developed and applied automated techniques for quantifying at a large-scale the quality of online conversations.

    1. We studied how political subreddits with different norms for civility are able to reproduce those norms despite turnover of members. We somewhat surprisingly found that selection effects are small as is learning after the first post. Instead, norms are sustained due to people who join a subreddit with distinctive civility norms (either much higher lower than other political subreddits) deviating from their own personal patterns in a way that matches the subreddit's pattern, even in their very first comments.

    2. We quantified the prevalence of political discussion in non-political online communities (about half of all political comments on reddit occur in non-political subreddit) and found that political conversations were less toxic in non-political subreddits.

       

    3. We quantified the prevalence of cross-partisan commenting on YouTube. Conservatives were much more likely to comment on left-leaning videos than liberals on right-leaning videos. Both groups had similar toxicity levels overall. Cross-partisan replies were more toxic than co-partisan replies on both left-leaning and right-leaning videos, with cross-partisan replies being especially toxic on the replier’s home turf.

    4. We are developing a public dashboard that will quantify on an ongoing basis the prevalence of HOT comments (Hateful, Offensive, Toxic) in conversations about daily news on reddit, Twitter, and Youtube.

  3. We developed tools for improving the consistency of human labeling of content, an important step in training and evaluating automated classifiers.

    1. In a controlled experiment, we found that specific examples were more useful than detailed codebooks for training human raters.

    2. This prompted the development of a new approach for automatically selecting training examples, based on disagreements among raters.

    3. In a second experiment, we found that a prompt asking raters to provide free-from text reflections after training examples further improved their effectiveness.

 


Last Modified: 12/04/2022
Modified by: R. K Garrett

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page