Award Abstract # 1718262
RI: Small: Modeling Vividness and Symbolism for Decoding Visual Rhetoric

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF PITTSBURGH - OF THE COMMONWEALTH SYSTEM OF HIGHER EDUCATION
Initial Amendment Date: July 27, 2017
Latest Amendment Date: April 24, 2019
Award Number: 1718262
Award Instrument: Standard Grant
Program Manager: Jie Yang
jyang@nsf.gov
 (703)292-4768
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2017
End Date: July 31, 2021 (Estimated)
Total Intended Award Amount: $449,978.00
Total Awarded Amount to Date: $449,978.00
Funds Obligated to Date: FY 2017 = $449,978.00
History of Investigator:
  • Adriana Kovashka (Principal Investigator)
    kovashka@cs.pitt.edu
  • Rebecca Hwa (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Pittsburgh
4200 FIFTH AVENUE
PITTSBURGH
PA  US  15260-0001
(412)624-7400
Sponsor Congressional District: 12
Primary Place of Performance: University of Pittsburgh
Pittsburgh
PA  US  15213-2303
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): MKAGLD59JRL1
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7495, 7923
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project develops systems for analyzing and inferring the non-literal messages conveyed in the media through persuasive images and text. Computational representations of two persuasive strategies are devised to model the mapping between observable information and underlying messages. First, this project models "vividness": through analyses of how human subjects perceive images and text, the system aims to identify relevant regions in which creative techniques were used to draw the viewer's attention. Second, this project models "symbolism": through analyses of semantic relationships between concrete objects and abstract concepts, the system aims to decode symbolic associations that humans make. The ability to automatically understand vividness and symbolism is key to building computational intelligence that can make inferences about what the media implies. This interdisciplinary project also has an educational component of potentially increasing the media literacy of school students, and involving college students from diverse backgrounds into computational research. The work can be used to discover patterns in how the visual rhetoric in the media evolved over time or how it differs in different cultures.

This research pursues three directions. First, a framework for judging vividness (i.e., to what degree an image as a whole is vivid; what part of an image is vivid; and whether a text snippet is vivid) is developed. Data about the vividness of a variety of images and text is collected from human annotators. Cues and techniques such as saliency, attention, sentiment, memorability and abnormality are used to build prediction models for vividness. Second, two pipelines for detecting symbolic references are developed. One pipeline hypothesizes potential signifiers from an image, then uses textual resources to map these to signifieds. The other pipeline directly hypothesizes what the signifieds might be, and obtains training data for these from web resources. The outputs from these pipelines are combined to generate the signifier-signified pairs. Third, a method for generating explanations of the strategies is developed, using the vividness and symbolism outputs. Numerous resources to be shared with the research community are developed over the course of the project.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Guo, Meiqi and Hwa, Rebecca and Kovashka, Adriana "Detecting Persuasive Atypicality by Modeling Contextual Compatibility" Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , 2021 Citation Details
Thomas, Christopher and Kovashka, Adriana "Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task" International Journal of Computer Vision , 2021 https://doi.org/10.1007/s11263-021-01506-3 Citation Details
Thomas, Christopher and Kovashka, Adriana "Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval" Proceedings of the European Conference on Computer Vision (ECCV) , 2020 https://doi.org/10.1007/978-3-030-58523-5_19 Citation Details
Unal, Mesut Erhan and Ye, Keren and Zhang, Mingda and Thomas, Christopher and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse "Learning to Overcome Noise in Weak Caption Supervision for Object Detection" IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 https://doi.org/10.1109/TPAMI.2022.3187350 Citation Details
Ye, Keren and Kovashka, Adriana "A Case Study of the Shortcut Effects in Visual Commonsense Reasoning" Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI) , 2021 Citation Details
Ye, Keren and Kovashka, Adriana "Linguistic Structures as Weak Supervision for Visual Scene Graph Generation" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2021 Citation Details
Ye, Keren and Zhang, Mingda and Kovashka, Adriana "Breaking Shortcuts by Masking for Robust Visual Reasoning" Proceedings of the Winter Conference on Applications of Computer Vision (WACV) , 2021 https://doi.org/10.1109/WACV48630.2021.00356 Citation Details
Zhang, Mingda and Hwa, Rebecca and Kovashka, Adriana "Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text" British Machine Vision Conference (BMVC) , 2018 Citation Details
Zhang, Mingda and Hwa, Rebecca and Kovashka, Adriana "How to Practice VQA on a Resource-limited Target Domain" 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2023 https://doi.org/10.1109/wacv56688.2023.00443 Citation Details
Zhang, Mingda and Maidment, Tristan and Diab, Ahmad and Kovashka, Adriana and Hwa, Rebecca "Domain-robust VQA with diverse datasets and methods but no target labels" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2021 Citation Details
Zhang, Mingda and Ye, Keren and Hwa, Rebecca and Kovashka, Adriana "Story Completion with Explicit Modeling of Commonsense Knowledge" The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020 , 2020 https://doi.org/10.1109/CVPRW50498.2020.00196 Citation Details
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project transformed the computer vision field in two ways. First, it put an emphasis on the importance of analyzing persuasive media, e.g. advertisements and images in political articles. We developed a variety of techniques for visual reasoning that tackled the specific challenges of persuasive imagery. Second, the project allowed us to develop methods that examine and leverage the complementarity of visual and textual information, in persuasive media and general multimodal data. As a related problem, we also examined how visual reasoning methods overfit to specific aspects of the multimodal inputs (e.g. text) and how robust reasoning methods are to shallow  changes in the input.

We focused on five separate research thrusts. The first line of research, resulting in a recent acceptance to ICCV 2021, was directly related to symbolism and focused on detecting persuasive atypicality in advertisement images. In an earlier work we annotated images in our advertisement dataset (CVPR 2017, TPAMI 2019) as showing objects in a typical or atypical (abnormal) manner. In this project, we developed a new technique to detect atypicality, based on the intuition that relative position of objects with respect to one another is a strong indicator of atypicality, and can be learned by modeling context. We are currently working on evaluating how well language models can parse symbolism, i.e. infer what is being symbolized by a symbol (e.g. "dragon symbolizes ___" where ___ is "danger").

In a second line of research (published in BMVC 2018), we examined the relationship between the image visuals, and the slogan appearing in the image. We studied the distinct ways in which these reinforce each other and jointly make a single argument, without necessarily making the exact same point and being redundant. The latter, i.e. literal alignment between image and text, has been commonly studied in vision-language tasks such as image captioning. In contrast, we find that the image and text pair for a single ad complement each other in more creative ways. For example, one of image/text can be purposefully ambiguous, in order to capture the viewer's attention in decoding the ambiguity; the image and text can even individually appear to contradict each other, but when viewed together, make a unified argument. In a follow-up work, appearing in a CVPR 2020 workshop, we tackled decoding the allusions that narratives make. Advertisements are a type of narrative, so as a form of preliminary exploration, we first focused on textual-only narratives, specifically choosing the correct ending for a story. We examined the connection between context and endings, by looking at any relationships between context/ending words, according to a knowledge base resource.

The third line of research focused on undestanding the relation between images and text in multimodal political articles. We extended our NeurIPS 2019 conference paper into a IJCV 2021 journal version. That work's goal was to infer political bias from images, by using text as an auxiliary modality. A follow-up work (presented at ECCV 2020) developed a cross-modal retrieval method which relied on within-modality constraints to help deal with the complementarity of image-text pairs and the diverse appearance of imagery that corresponds to the same topic.

A fourth line of work, inspired initially by our work on advertisements, was to train a scene graph generation model from weak supervision contained in captions. This was a follow-up to our prior work, which trained object detection algortihms from captions. It was presented at CVPR 2021.

Our fifth line of work was to examine robustness of visual question-answering (VQA) models. In one work, appearing in WACV 2021, investigated how to make use of external knowledge base information, for performing a visual reasoning task on our ads dataset. We discovered that because of the way the evaluation task is set up, it is easy for the model to find shallow "shortcuts" and ignore knowledge pieces, which are needed for reasoning on less-common brands. We tackled the problem through three stochastic masking techniques. In the follow-up work, appearing in AAAI 2021, we showed that the shortcut problem exists in other reasoning datasets and that reasoning methods (including the recent transformer models) suffer greatly from simple input changes that should not change the meaning of the question and answer. We proposed masking on a curriculum to ameliorate the issue. In our final work, appearing in CVPR 2021, we examined how robust VQA models are to training and testing on different datasets.

Our work funded four graduate students for multiple semesters, two of whom graduated, one is female, and one will pursue a career in academia. It also resulted in two publicly released datasets.


Last Modified: 10/29/2021
Modified by: Adriana Kovashka

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page