Award Abstract # 2013801
TWC SBE: Medium: Context-Aware Harassment Detection on Social Media

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: UNIVERSITY OF SOUTH CAROLINA
Initial Amendment Date: May 28, 2020
Latest Amendment Date: May 28, 2020
Award Number: 2013801
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: December 3, 2019
End Date: June 30, 2020 (Estimated)
Total Intended Award Amount: $13,238.00
Total Awarded Amount to Date: $45,238.00
Funds Obligated to Date: FY 2015 = $13,236.00
FY 2016 = $16,000.00

FY 2017 = $16,000.00
History of Investigator:
  • Amit Sheth (Principal Investigator)
    amit@sc.edu
Recipient Sponsored Research Office: University of South Carolina at Columbia
1600 HAMPTON ST
COLUMBIA
SC  US  29208-3403
(803)777-7093
Sponsor Congressional District: 06
Primary Place of Performance: University of South Carolina at Columbia
SC  US  29208-0001
Primary Place of Performance
Congressional District:
06
Unique Entity Identifier (UEI): J22LNTMEDP73
Parent UEI: Q93ZDA59ZAR5
NSF Program(s): Special Projects - CNS,
Secure &Trustworthy Cyberspace
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 7434, 7924, 9178, 9251
Program Element Code(s): 171400, 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

As social media permeates our daily life, there has been a sharp rise in the use of social media to humiliate, bully, and threaten others, which has come with harmful consequences such as emotional distress, depression, and suicide. The October 2014 Pew Research survey shows that 73% of adult Internet users have observed online harassment and 40% have experienced it. The prevalence and serious consequences of online harassment present both social and technological challenges. This project identifies harassing messages in social media, through a combination of text analysis and the use of other clues in the social media (e.g., indications of power relationships between sender and receiver of a potentially harassing message.) The project will develop prototypes to detect harassing messages in Twitter; the proposed techniques can be adapted to other platforms, such as Facebook, online forums, and blogs. An interdisciplinary team of computer scientists, social scientists, urban and public affairs professionals, educators, and the participation of college and high schools students in the research will ensure wide impact of scientific research on the support for safe social interactions.

This project combines social science theory and human judgment of potential harassment examples from social media, in both school and workplace contexts, to operationalize the detection of harassing messages and offenders. It develops comprehensive and reliable context-aware techniques (using machine learning, text mining, natural language processing, and social network analysis) to glean information about the people involved and their interconnected network of relationships, and to determine and evaluate potential harassment and harassers. The key innovations of this work include: (1) identification of the generic language of insult, characterized by profanities and other general patterns of verbal abuse, and recognition of target-dependent offensive language involving sensitive topics that are personal to a specific individual or social circle; (2) prediction of harassment-specific emotion evoked in a recipient after reading messages by leveraging conversation history as well as sender's emotions; (3) recognition of a sender's malicious intent behind messages based on the aspects of power, truth (approximated by trust), and familiarity; (4) a harmfulness assessment of harassing messages by fusing aforementioned language, emotion, and intent factors; and (5) detection of harassers from their aggregated behaviors, such as harassment frequency, duration, and coverage measures, for effective prevention and intervention.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bhatt, S., Padhee, S., Sheth, A., Chen, K., Shalin, V., Doran, D. and Minnery, B. "Knowledge graph enhanced community detection and characterization" Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , 2019
Gaur, M., Kursuncu, U., Sheth, A., Wickramarachchi, R. and Yadav, S. "Knowledge-infused Deep Learning" Proceedings of the 31st ACM Conference on Hypertext and Social Media , 2020
Kursuncu, U., Gaur, M., Castillo, C., Alambo, A., Thirunarayan, K., Shalin, V., Achilov, D., Arpinar, I.B. and Sheth, A. "Modeling Islamist extremist communications on social media using contextual dimensions: Religion, ideology, and hate." Proceedings of the ACM on Human-Computer Interaction , 2019
Rezvan, M., Shekarpour, S., Alshargi, F., Thirunarayan, K., Shalin, V.L. and Sheth, A. "Analyzing and learning the language for different types of harassment." Plos one , 2020
Rezvan, M., Shekarpour, S., Balasuriya, L., Thirunarayan, K., Shalin, V.L. and Sheth, A. "A quality type-aware annotated corpus and lexicon for harassment research" In Proceedings of the 10th ACM Conference on Web Science , 2018
Rüsenberg, F., Hampton, A.J., Shalin, V.L. and Feufel, M.A. "Stop Words Are Not Nothing: German Modal Particles and Public Engagement in Social Media" International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation , 2018
Shekarpour, S., Marx, E., Auer, S. and Sheth, A.P. "RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem" AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence , 2017
Wijesiriwardene, T., Inan, H., Kursuncu, U., Gaur, M., Shalin, V.L., Thirunarayan, K., Sheth, A. and Arpinar, I.B. "ALONE: A Dataset for Toxic Behavior among Adolescents on Twitter" International Conference on Social Informatics , 2020
Yazdavar, A.H., Mahdavinejad, M.S., Bajaj, G., Romine, W., Sheth, A., Monadjemi, A.H., Thirunarayan, K., Meddar, J.M., Myers, A., Pathak, J. and Hitzler, P. "Multimodal mental health analysis in social media." Plos one , 2020

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Online harassment and toxic content have many negative impacts on human interaction and society as a whole. In extreme cases this leads  individuals to depression and even suicide. Social media companies are trying to tackle this issue, and have reportedly hired thousands of human moderators to intercept and purge such content, so far with unsatisfactory results. Automated moderation is  difficult due to the context sensitivity of human language, changes in the cultural acceptability of potentially harassing keywords, and relative sparsity of harassing content.

In this interdisciplinary work, we have studied online harassment by going beyond the key-word notations to identify the context of online harassment and toxicity. We employed several key concepts from social psychology such as conversation analysis, intentionality, and group phenomena to identify and analyze online harassment. We have developed and provided to the research community two unique datasets (a) a dataset covering different aspects of the problem (sexual, racial, appearance, intelligence, politics, generic), and (b) a tagged dataset of anonymized online content of adolescent conversations, capturing both context and their language preferences.  Our own analyses of these datasets enriches the understanding of the harassment problem in at least two respects. First context matters in the assessment of harassing content, including prior relationships between the participants, community membership (and the insider-outsider phenomenon in particular), cohort linguistic practices and the presence of exonerating content (to distinguish threatening tweets from superficially harassing curse-word laden friendly banter).  Second, harassment sparsity not only threatens the practicality of machine learning classifiers, but also the risk of false alarms given a high base rate of non-harassing content (i.e., sparsity of positive cases).   Both properties of the harassment problem require a top-down knowledge-based approach to the detection of harassment in contrast to the more prevalent data driven efforts.

To overcome data sparsity in harassment we have examined detection issues in a related model domain of antisocial behavior: extremist communication.  We show how  knowledge bases for religion, ideology and hate support computational detection of extremists. This knowledge based approach successfully excludes likely mislabeled users and therefore addresses the false alarm problem. This knowledge based approach also enhances computational community detection, relevant to the identification of toxic user groups in online communities. The project provided extensive opportunity for training for students and postdocs from computer science as well as cognitive science in a highly inclusive, interdisciplinary setting.


 


Last Modified: 11/15/2020
Modified by: Amit Sheth

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page