Award Abstract # 1838193
BIGDATA: IA: Multiplatform, Multilingual, and Multimodal Tools for Analyzing Public Communication in over 100 Languages

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF BOSTON UNIVERSITY
Initial Amendment Date: September 7, 2018
Latest Amendment Date: September 7, 2018
Award Number: 1838193
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 15, 2018
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $1,000,000.00
Total Awarded Amount to Date: $1,000,000.00
Funds Obligated to Date: FY 2018 = $1,000,000.00
History of Investigator:
  • Margrit Betke (Principal Investigator)
    betke@cs.bu.edu
  • Prakash Ishwar (Co-Principal Investigator)
  • Lei Guo (Co-Principal Investigator)
  • Derry Wijaya (Co-Principal Investigator)
Recipient Sponsored Research Office: Trustees of Boston University
1 SILBER WAY
BOSTON
MA  US  02215-1703
(617)353-4365
Sponsor Congressional District: 07
Primary Place of Performance: Trustees of Boston University
881 Commonwealth Ave
Boston
MA  US  02215-1300
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): THL6A6JLE1S7
Parent UEI:
NSF Program(s): HCC-Human-Centered Computing,
Big Data Science &Engineering,
Data Infrastructure
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 062Z, 7433, 8083
Program Element Code(s): 736700, 808300, 829400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In today's information age, understanding public communication flows around the world is important to United States policy and diplomacy. The challenge for research is to collect, analyze, and interpret information as it is presented worldwide, creating big data that is flowing at high velocity, in large volumes, with much variety in perspective, language, and platforms. Analytic methods for studying textual and visual public information worldwide are limited by language hurdles. This project aims to solve data analytics problems in the domain of international public information flows by developing methods that effectively leverage natural language processing, machine learning, and computer vision tools.

This research will involve collecting multilingual, multiplatform, and multimodal corpora of text and images originating in the U.S. and reported worldwide, developing an interactive budget-efficient methodology for annotation by experts and crowdworkers that scales effectively, using machine learning and deep learning techniques that exploit multilingual and multimodal representations to develop data analytics tools for entity and frame recognition, sentiment analysis of entities and frames, and curating balanced real-time content collections for many languages. This project is expected to generate analytical tools for social scientists and others to better examine the international flow of public communications. The annotated data will provide training and benchmark datasets that can propel research in entity and frame recognition, sentiment analysis, and other related natural language processing tasks for many languages.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 31)
Akyürek, Afra Feyza and Guo, Lei and Elanwar, Randa and Ishwar, Prakash and Betke, Margrit and Wijaya, Derry Tanti "Multi-Label and Multilingual News Framing Analysis" Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 2020 https://doi.org/10.18653/v1/2020.acl-main.763 Citation Details
Andrea Burns, Donghyun Kim "Learning to Scale Multilingual Representations for Vision-Language Tasks" European Conference on Computer Vision , 2020 https://doi.org/ Citation Details
Andrea Burns, Donghyun Kim "Learning to Scale Multilingual Representations for Vision-Language Tasks" European Conference on Computer Vision , 2020 https://doi.org/10.1007/978-3-030-58548-8 Citation Details
Andy, A. and Callison-Burch, C. and Wijaya, D. "Resolving Pronouns in Twitter Streams: Context can Help!" Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference , 2020 https://doi.org/ Citation Details
Bhatia, Vibhu and Akavoor, Vidya Prasad and Paik, Sejin and Guo, Lei and Jalal, Mona and Smith, Alyssa and Tofu, David Assefa and Halim, Edward Edberg and Sun, Yimeng and Betke, Margrit and Ishwar, Prakash and Wijaya, Derry Tanti "OpenFraming: Open-sourced Tool for Computational Framing Analysis of Multilingual Data" Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online and Punta Cana, Dominican Republic , 2021 https://doi.org/10.18653/v1/2021.emnlp-demo.28 Citation Details
Coppock, E. and Dionne, D. and Graham, N. and Ganem, E. and Zhao, S. and Lin, S. and Liu, W. and Wijaya, D. "Informativity in Image Captions vs. Referring Expressions" Proceedings of the Probability and Meaning Conference (PaM 2020) , 2020 https://doi.org/ Citation Details
Elanwar, Randa and Qin, Wenda and Betke, Margrit and Wijaya, Derry "Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model" International Journal on Document Analysis and Recognition (IJDAR) , 2021 https://doi.org/10.1007/s10032-021-00382-4 Citation Details
Gao, Ge and Paik, Sejin and Reardon, Carley and Zhao, Yanling and Guo, Lei and Ishwar, Prakash and Betke, Margrit and Wijaya and Derry Tanti "Prediction of Peoples Emotional Response towards Multi-modal News" Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , 2022 Citation Details
Guo, L. and Mays, K. and Zhang, Y. and Wijaya, D. and Betke, M. "What makes gun violence a (less) prominent issue? A computational analysis of compelling arguments and selective agenda setting" AEJMC annual conference , 2019 Citation Details
Guo, Lei and Mays, Kate and Lai, Sha and Jalal, Mona and Ishwar, Prakash and Betke, Margrit "Accurate, Fast, But Not Always Cheap: Evaluating Crowdcoding as an Alternative Approach to Analyze Social Media Data" Journalism & Mass Communication Quarterly , v.97 , 2019 https://doi.org/10.1177/1077699019891437 Citation Details
Guo, Lei and Mays, Kate and Zhang, Yiyan and Wijaya, Derry and Betke, Margrit "What makes gun violence a (less) prominent issue? A computational analysis of compelling arguments and selective agenda setting" Mass Communication and Society , 2021 https://doi.org/10.1080/15205436.2021.1898644 Citation Details
(Showing: 1 - 10 of 31)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This award supported the Artificial Intelligence and Emerging Media (AIEM) research group at Boston University to conduct research and foster education in areas related to artificial intelligence and emerging media. The group explores and creates techniques from machine learning, natural language processing, and computer vision to interpret emerging media and their role in mass and interpersonal communication.  AIEM studies the human and automated processes by which emerging media are developed, marketed, shaped and reshaped by users.

Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as "frames" which, when used in news media, will influence the opinion of their readers in multiple ways. AIEM developed an automated method for effectively detecting frames in news headlines in any language and applied it in a large scale study of 88 thousand news headlines about the coverage of gun violence in the U.S.between 2016 and 2018 in English, German, Arabic, and Turkish.  The analysis revealed that the U.S. media highly politicized the reporting of gun violence. The research team also applied the frame detection approach to COVID-19 news reports in nine regions of the world in 2020.

AIEM developed an interactive web-based tool called OpenFraming for automatically analyzing and classifying frames in text documents.  The goal for providing this tool was to make automatic frame discovery and labeling based on topic modeling and deep learning widely accessible to researchers from a diverse array of disciplines. To this end, the team provided both state-of-the-art pre-trained frame classification models on various issues as well as a user-friendly pipeline for training novel classification models on user-provided corpora. Researchers can submit their documents and obtain frames of the documents.The degree of user involvement is flexible: a user can run models that have been pre-trained on select issues; submit labeled documents and train a new model for frame classification; or submit unlabeled documents and obtain potential frames of the documents. The code making up the OpenFraming tool is open-sourced and well documented, making the system transparent and expandable. 

AIEM developed AI-based methods to identify textual and visual news items that will trigger similar versus divergent emotional responses by news consumers. The group published a dataset that can serve as a benchmark for AI methods predicting people's emotional reactions towards multi-modal news content (images and headlines).

Most recently, the AIEM team explored the affective responses and newsworthiness perceptions of generative AI for visual journalism. While generative AI offers advantages for newsrooms in terms of producing unique images and cutting costs, the potential misuse of AI-generated news images is a cause for concern. For the study, the team designed a 3-part news image codebook for affect-labeling news images based on journalism ethics and photography guidelines. They collected 200 news headlines and images retrieved from a variety of U.S. news sources on the topics of gun violence and climate change, generated corresponding news images from a commercial image-generating AI model, and asked study participants to annotate their emotional responses to the human-selected and AI-generated news images following the codebook. The team examined the impact of modality on emotions by measuring the effects of visual and textual modalities on emotional responses. The findings of this study provide insights into the quality and emotional impact of generative news images produced by people and AI. Further, results of this work can be useful in developing technical guidelines as well as policy measures for the ethical use of generative AI systems in journalistic production.



Last Modified: 03/11/2024
Modified by: Margrit Betke

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page