Award Abstract # 1566270
CRII: RI: Automatically Understanding the Messages and Goals of Visual Media

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF PITTSBURGH - OF THE COMMONWEALTH SYSTEM OF HIGHER EDUCATION
Initial Amendment Date: May 27, 2016
Latest Amendment Date: March 8, 2018
Award Number: 1566270
Award Instrument: Standard Grant
Program Manager: Jie Yang
jyang@nsf.gov
 (703)292-4768
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2016
End Date: May 31, 2019 (Estimated)
Total Intended Award Amount: $174,590.00
Total Awarded Amount to Date: $182,590.00
Funds Obligated to Date: FY 2016 = $174,590.00
FY 2018 = $8,000.00
History of Investigator:
  • Adriana Kovashka (Principal Investigator)
    kovashka@cs.pitt.edu
Recipient Sponsored Research Office: University of Pittsburgh
4200 FIFTH AVENUE
PITTSBURGH
PA  US  15260-0001
(412)624-7400
Sponsor Congressional District: 12
Primary Place of Performance: University of Pittsburgh
University Club
Pittsburgh
PA  US  15213-2303
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): MKAGLD59JRL1
Parent UEI:
NSF Program(s): CRII CISE Research Initiation,
Robust Intelligence
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7495, 8228, 9251
Program Element Code(s): 026Y00, 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project develops technologies to interpret the visual rhetoric of images. The project advances computer vision through novel solutions to the novel problem of decoding the visual messages in advertisements and artistic photographs, and thus brings computer vision closer to its goal of being able to automatically understand visual content. From a practical standpoint, understanding visual rhetoric can be used to produce image descriptions for the visually impaired that align with how a human would label these images, and thus give them access to the rich content shown in newspapers or on TV. This project is tightly integrated with education. The work is interdisciplinary and can attract undergraduate students to the research from different fields.

This research focuses on three media understanding tasks: (1) understanding the persuasive messages conveyed by artistic images and the strategies that those images use to convey their message; (2) exposing a photographer's bias towards their subject, e.g., determining whether a photograph portrays its subject in a positive or negative light; and (3) predicting what part of an artistic photograph a viewer might find most captivating or poignant. To enable decoding of artistic images, a large dataset is collected and annotated with a number of artistic properties and persuasion techniques that are intended for human understanding, then methods are developed to model visual symbolism in artistic images, as well as adapt positive/negative effect methods from sentiment analysis. To predict the photographer's bias towards a subject, a dataset of historical and modern portrayals of minorities and foreigners is collected, then an algorithm is created that reasons about body language and 3D layout and composition of the photo. To predict poignance, eyetracking data on a set of artistic images from famous photographers is collected, then semantic and connotation conflicts between the objects in the photographs are analyzed.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Christopher Thomas, Adriana Kovashka "Persuasive Faces: Generating Faces in Advertisements" British Machine Vision Conference (BMVC) , 2018
Christopher Thomas and Adriana Kovashka "Artistic Object Recognition by Unsupervised Style Adaptation" Asian Conference on Computer Vision (ACCV) , 2018
Keren Ye, Adriana Kovashka "ADVISE: Symbolism and External Knowledge for Decoding Advertisements" European Conference on Computer Vision (ECCV) 2018 , 2018
Keren Ye, Kyle Buettner, Adriana Kovashka "Story Understanding in Video Advertisements" British Machine Vision Conference (BMVC) , 2018
Nils Murrugarra-Llerena and Adriana Kovashka "Cross-Modality Personalization for Retrieval" IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2019
Thomas, Christopher and Kovashka, Adriana "Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task" International Journal of Computer Vision , 2021 https://doi.org/10.1007/s11263-021-01506-3 Citation Details
Unal, Mesut Erhan and Ye, Keren and Zhang, Mingda and Thomas, Christopher and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse "Learning to Overcome Noise in Weak Caption Supervision for Object Detection" IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 https://doi.org/10.1109/TPAMI.2022.3187350 Citation Details
Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, Adriana Kovashka "Automatic Understanding of Image and Video Advertisements" Computer Vision and Pattern Recognition (CVPR) 2017 , 2017
Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, Adriana Kovashka. "Automatic Understanding of Image and Video Advertisements" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2017

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of this project is to develop resources and techniques for understanding the "visual rhetoric" of the media. Unlike most work in computer vision which analyses explicit content in photographs, we are primarily interested in what can be "read between the lines" of a photograph, i.e. persuasive messages that are encoded in a photo but require careful analysis of the image content to discover. For example, advertisements and public service announcements visually encode messages and stories to convince the audience to take a certain action. This rhetoric depends on non-photorealistic portrayal of objects, symbolic associations, commonsense reasoning, etc. In addition to understanding advertisements specifically, we also learn to infer what photos in the news imply about their photo subjects, and what their political bias is. The project also aims to educate graduate and undergraduate students, and inform the computer vision and AI community about the rich challenges that automatic visual media understanding poses.

This award funded nine projects, which were published in CVPR, ICCV, ECCV, NeurIPS, BMVC and ACCV. The first of these projects contributed a large, richly annotated dataset and posed several concrete tasks to measure how well the system understands the rhetoric of the ad, ranging from relatively simple (classify topic and sentiment) to more complex ones (answer questions about what the viewer should do, based on the ad, and what arguments the ad provides for taking the suggested action). In follow-up work, we proposed a novel suite of mechanisms for retrieving action-reason statements, in a multiple-choice, multi-modal scenario, through novel metric learning techniques. We also developed more general techniques inspired by the challenges of predicting the rhetoric of ads. We proposed a weakly supervised object detection method, which was inspired by the associations between object regions and properties in our learned feature space. This work relies on captions at the image level, to learn to localize objects at the box level. We also examined the relationship between image and text in political articles, where the alignment between the two modalities is very weak. We used the text as a privileged modality at training time, to learn to predict the political bias in an image, better than a variety of strong baselines. Finally, we developed a domain adaptation approach for recognizing objects in atypical modalities (e.g. sketches, paintings), as a step towards being able to recognize non-photorealistic objects in advertisements.

In addition to these research projects, we organized a workshop at CVPR 2018, which included invited talks, posters, and a challenge. We discussed the challenges of ad understanding and brainstormed potential future directions in group exercises. We also showcased our dataset. The dataset has 64,832 annotated images and 3,477 videos, and about 730,000 annotations. It has been accessed in 6,010 unique sessions: 2,425 times from the United States (40 different states), 997 from India, 402 from Japan, 306 from France, 301 from China, and more from 73 other countries.

The PI collaborated on the published projects above with thirteen students, four of whom were undergraduates, two are female, and one is Latin-American. Four of these students have been funded by this project.

In summary, this work funded contributions in the domain of vision and language, metric learning, weak supervision, transfer learning and learning with privileged information, domain adaptation and generalization, content and style separation, and video understanding. We identified several future research directions: how to best perform commonsense reasoning, how to model the complementary information of image and text in weakly supervised fashion, and how to extract robust representations of objects for domain generalization. This project has also spurred collaborations of computer and information scientists, with social and political scientists.


Last Modified: 09/16/2019
Modified by: Adriana Kovashka

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page