Award Abstract # 1831848
RIDIR: Collaborative Research: Integrated Communication Database and Computational Tools

NSF Org: SMA
SBE Office of Multidisciplinary Activities
Recipient: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Initial Amendment Date: September 5, 2018
Latest Amendment Date: September 5, 2018
Award Number: 1831848
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
SMA
 SBE Office of Multidisciplinary Activities
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 15, 2018
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $944,182.00
Total Awarded Amount to Date: $944,182.00
Funds Obligated to Date: FY 2018 = $944,182.00
History of Investigator:
  • Jungseock Joo (Principal Investigator)
    jjoo@comm.ucla.edu
  • Francis Steen (Co-Principal Investigator)
  • Tim Groeling (Co-Principal Investigator)
  • Zachary Steinert-Threlkeld (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Los Angeles
10889 WILSHIRE BLVD STE 700
LOS ANGELES
CA  US  90024-4200
(310)794-0102
Sponsor Congressional District: 36
Primary Place of Performance: University of California-Los Angeles
Rolfe Hall
Los Angeles
CA  US  90095-1484
Primary Place of Performance
Congressional District:
36
Unique Entity Identifier (UEI): RN64EPNH8JC6
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace,
Data Infrastructure
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 026Z, 062Z, 065Z, 7433, 7434, 8083, 9178, 9179
Program Element Code(s): 806000, 829400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

This project will develop an integrated research framework for sociotechnical cybersecurity research and broader investigations of information provenance by behavioral, information, and computer scientists. Currently, researchers are mainly limited to natural language processing of large bodies of online text. This project will make it possible to analyze larger information worlds, including those from such countries as China, and the flow of information, including video and audio information, in newspapers, TV, and online sources. The project addresses a core goal of cybersecurity research, which is to understand the provenance, flow, and termination of information warfare, and censorship.

The project is aimed at constructing an integrated and unified information database that combines mass communication data from TV and print sources from six locations, with data from two popular online communication platforms. The project will generate a variety of metadata and time series data on topics, actors, events, and sentiments presented in communications by automated multimodal content analysis using text, image, video, and audio. Variables will be linked to identify trajectories of information flow between communication channels through multiple platforms. It will develop a new class of computational models and algorithms that can automatically analyze both verbal and nonverbal communications data by machine learning, computer vision, deep learning, and natural language processing. This project will allow researchers across the computational and social sciences to access the metadata and time series data through a search interface for qualitative research, a statistical package for quantitative research, and various visualization tools. This project will therefore link previously untapped data sources using cutting-edge computational methods to enable scholars to conduct systematic research on large-scale patterns in the emerging information and communication ecosystem.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Chen, Yunliang and Joo, Jungseock "Understanding and Mitigating Annotation Bias in Facial Expression Recognition" Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021 Citation Details
Ha, Yui and Park, Kunwoo and Kim, Su Jung and Joo, Jungseock and Cha, Meeyoung "Automatically Detecting ImageText Mismatch on Instagram with Deep Learning" Journal of Advertising , v.50 , 2021 https://doi.org/10.1080/00913367.2020.1843091 Citation Details
Joo, Jungseock and Kärkkäinen, Kimmo "Gender Slopes: Counterfactual Fairness for Computer Vision Models by Attribute Manipulation" FATE/MM '20: Proceedings of the 2nd International Workshop on Fairness, Accountability, Transparency and Ethics in Multimedia , 2020 https://doi.org/10.1145/3422841.3423533 Citation Details
Joo, Jungseock and Steinert-Threlkeld, Zachary C. "Image as Data: Automated Content Analysis for Visual Presentations of Political Actors and Events" Computational Communication Research , v.4 , 2022 https://doi.org/10.5117/CCR2022.1.001.JOO Citation Details
Karkkainen, Kimmo and Joo, Jungseock "FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation" 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) , 2021 https://doi.org/10.1109/WACV48630.2021.00159 Citation Details
Lin, Xiaofeng and Kernell, Georgia and Groeling, Tim and Joo, Jungseock and Luo, Jun and Steinert-Threlkeld, Zachary C. "Mask images on Twitter increase during COVID-19 mandates, especially in Republican counties" Scientific Reports , v.12 , 2022 https://doi.org/10.1038/s41598-022-23368-6 Citation Details
Lin, Xiaofeng and Kim, Seungbae and Joo, Jungseock "FairGRAPE: Fairness-Aware GRAdient Pruning mEthod for Face Attribute Classification" European Conference on Computer Vision , 2022 https://doi.org/10.1007/978-3-031-19778-9_24 Citation Details
Lu, Yingdan and Schaefer, Jack and Park, Kunwoo and Joo, Jungseock and Pan, Jennifer "How Information Flows from the World to China" The International Journal of Press/Politics , 2022 https://doi.org/10.1177/19401612221117470 Citation Details
Park, Kunwoo and Pan, Zhufeng and Joo, Jungseock "Who Blames or Endorses Whom? Entity-to-Entity Directed Sentiment Extraction in News Text" Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, , 2021 Citation Details
SOBOLEV, ANTON and CHEN, M. KEITH and JOO, JUNGSEOCK and STEINERT-THRELKELD, ZACHARY C. "News and Geolocated Social Media Accurately Measure Protest Size Variation" American Political Science Review , v.114 , 2020 https://doi.org/10.1017/S0003055420000295 Citation Details
Steinert-Threlkeld, Zachary and Joo, Jungseock "MMCHIVED: Multimodal Chile and Venezuela Protest Event Data" Proceedings of the International AAAI Conference on Web and Social Media , v.16 , 2022 https://doi.org/10.1609/icwsm.v16i1.19385 Citation Details
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Large scale datasets and computational analytic tools have been the key drivers in computational social science, but existing resources have mainly focused on text-based single data sources. The main goal of the project was to construct an integrated multi-platform international news database that combines both traditional mass media and social media data and to develop computational methods that can systematically track real-world events and analyze complex human and media behaviors by automated multimodal machine learning approaches. 


Throughout this project, the interdisciplinary project team have created numerous multimodal media datasets that can be and used for various research topics in communication, political science, and computer science. These datasets were sourced from multiple countries and diverse platforms. Individual content items were interlinked by machine learning methods to study inter-media news flows. These resources have enabled many concrete studies utilizing multiple media sources, such as Covid-related information flow between China and the world and global protest event analysis. 


The project team have also developed advanced machine learning tools and made them publicly available to researchers and students. These tools allow them to automatically analyze an enormous amount of multimodal data and tackle large scale research questions. In particular, many of these tools have improved hidden biases in AI models, which is imperative to obtain objective measures for social inquiries. 


Overall, the project has led to more than 20 publications in top academic venues. This project supported more than 10 graduate students, 2 postdoctoral researchers (both are currently tenure-track assistant professors), and more than 50 undergraduate students, who made significant contributions to the project. The products of this project – the datasets and tools – will continue to be maintained and improved to support the broader research community who seek to understand human behaviors in media.


 


Last Modified: 05/01/2023
Modified by: Jungseock Joo

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page