Award Abstract # 1633295
BIGDATA: F: Collaborative Research: From Visual Data to Visual Understanding

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL
Initial Amendment Date: August 22, 2016
Latest Amendment Date: September 7, 2021
Award Number: 1633295
Award Instrument: Standard Grant
Program Manager: Hector Munoz-Avila
hmunoz@nsf.gov
 (703)292-4481
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2016
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $350,000.00
Total Awarded Amount to Date: $350,000.00
Funds Obligated to Date: FY 2016 = $350,000.00
History of Investigator:
  • Ashok Krishnamurthy (Principal Investigator)
    ashok@renci.org
  • Tamara Berg (Former Principal Investigator)
Recipient Sponsored Research Office: University of North Carolina at Chapel Hill
104 AIRPORT DR STE 2200
CHAPEL HILL
NC  US  27599-5023
(919)966-3411
Sponsor Congressional District: 04
Primary Place of Performance: University of North Carolina at Chapel Hill
201 S. Columbia St
Chapel Hill
NC  US  27599-3175
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): D3LHU66KBLD5
Parent UEI: D3LHU66KBLD5
NSF Program(s): Big Data Science &Engineering
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 8083, 7433
Program Element Code(s): 808300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The field of visual recognition, which focuses on creating computer algorithms for automatically understanding photographs and videos, has made tremendous gains in the past few years. Algorithms can now recognize and localize thousands of objects with reasonable accuracy as well as identify other visual content, such as scenes and activities. For instance, there are now smart phone apps that can automatically sift through a user's photos and find all party pictures, or all pictures of cars, or all sunset photos. However, the type of "visual understanding" done by these methods is still rather superficial, exhibiting mostly rote memorization rather than true reasoning. For example, current algorithms have a hard time telling if an image is typical (e.g., car on a road) or unusual (e.g., car in the sky), or answering simple questions about a photograph, e.g., "what are the people looking at?", "what just happened?", "what might happen next?" A central problem is that current methods lack the data about the world outside of the photograph. To achieve true human-like visual understanding, computers will have to reason about the broader spatial, temporal, perceptual, and social context suggested by a given visual input. This project is using big visual data to gather large-scale deep semantic knowledge about how events, physical and social interactions, and how people perceive the world and each other. The research focuses on developing methods to capture and represent this knowledge in a way that makes it broadly applicable to a range of visual understanding tasks. This will enable novel computer algorithms that have a deeper, more human-like, understanding of the visual world and can effectively function in complex, real-world situations and environments. For example, if a robot can predict what a person might do next in a given situation, then the robot can better aid the person in their task. Broader impacts will include new publicly-available software tools and data that can be used for various visual reasoning tasks. Additionally, the project will have a multi-pronged educational component, including incorporating aspects of the research in the graduate teaching curriculum, undergraduate and K-12 outreach, as well as special mentoring and focused events for advancement of women in computer science.

The main technical focus of this project is to advance computational recognition efforts toward producing a general human-like visual understanding of images and video that can function on previously unseen data, unseen tasks and settings. The aim of this project is to develop a new large-scale knowledge base called the visual Memex that extracts and stores vast set of visual relationships between data items in a multi-graph representation, with nodes corresponding to data items and edges indicating different types of relationships. This large knowledge base will be used in a lambda-calculus-powered reasoning engine to make inferences about visual data on a global scale. Additionally, the project will test computational recognition algorithms on several visual understanding tasks designed to evaluate progress on a variety of aspects of visual understanding, including: linguistic (evaluating our understanding about imagery through language tasks such as visual question-answering), to purely visual (evaluating our understanding of spatial context through visual fill-in-the-blanks), to temporal (evaluating our temporal understanding by predicting future states), to physical (evaluating our understanding of human-object and human-scene interactions by predicting affordances). Datasets, knowledge base, and evaluation tools will be hosted on the project web site (http://www.tamaraberg.com/grants/bigdata.html).

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Yu, Licheng and Bansal, Mohit and Berg, Tamara "Hierarchically-Attentive RNN for Album Summarization and Storytelling" Empirical Methods in Natural Language Processing , 2017 10.18653/v1/D17-1101 Citation Details
BELZ, A. and BERG, T.L. and YU, L. "From image to language and back again" Natural Language Engineering , v.24 , 2018 10.1017/S1351324918000086 Citation Details
Feng, Z and Tu, M and Xia, R and Wang, Y and Krishnamurthy, A "Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos" IEEE International Conference on Big Data , 2020 https://doi.org/ Citation Details
Feng, Zishun and Sivak, Joseph A. and Krishnamurthy, Ashok K. "Two-Stream Attention Spatio-Temporal Network For Classification Of Echocardiography Videos" International Symposium on Biomedical Imaging 2021 , 2021 https://doi.org/10.1109/ISBI48211.2021.9433773 Citation Details
Feng, Zishun and Sivak, Joseph and Krishnamurthy, Ashok "Improving Echocardiography Segmentation by Polar Transformation" , 2022 Citation Details
Guo, Yue and Borland, David and McCormick, Caroline and Stein, Jason and Wu, Guorong and Krishnamurthy, Ashok "Cell Counting with Inverse Distance Kernel and Self-supervised Learning" , v.?? , 2025 Citation Details
Guo, Yue and Krupa, Oleh and Stein, Jason and Wu, Guorong and Krishnamurthy, Ashok "SAU-Net: A Unified Network for Cell Counting in 2D and 3D Microscopy Images" IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2021 https://doi.org/10.1109/TCBB.2021.3089608 Citation Details
Lei, Jie and Yu, Licheng and Bansal, Mohit and Berg, Tamara L. "TVQA: Localized, Compositional Video Question Answering" Empirical Methods in Natural Language Processing , 2018 Citation Details
Licheng Yu, Hao Tan "A Joint Speaker-Listener-Reinforcer Model for Referring Expressions" IEEE Conference on Computer Vision and Pattern Recognition , v.1 , 2017 Citation Details
Tommasi, Tatiana and Mallya, Arun and Plummer, Bryan and Lazebnik, Svetlana and Berg, Alexander C. and Berg, Tamara L. "Combining Multiple Cues for Visual Madlibs Question Answering" International Journal of Computer Vision , 2018 10.1007/s11263-018-1096-0 Citation Details
Yu, Licheng and Chen, Xinlei and Gkioxari, Georgia and Bansal, Mohit and Berg, Tamara L and Batra, Dhruv "Multi-Target Embodied Question Answering" IEEE Conference on Computer Vision and Pattern Recognition , 2019 Citation Details
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The objective of thsi project was to develop methods to analyze related multi-modal data to extract useful information. We focused on the biomedical domain with the goal of extracting useful information from biomedical images and related data such as electronic health records. To accomplish this, we focused on two specific problems.

The first problem was aimed at counting cells in 3D images of mice brain thta have been prepared ina specifc way. The number, type and distribution of these cells ahs important implicatons for brain health. We developed methods taht are able to accomplish the cell counting task efficientlt without the need for extensive human labor to prepare the training data needed in most cases. The method is in use by neuroscientists.

The second problem we tacked was that of looking at echocardiogram video sequences along with electronic health records for early detection of cardiac diseases. We focused on cardiac amyloidosis that requires a high level of linical expertise to identify. We showed that our developed methods can be of significant benefit in detecting cardiac amyloidosis and can serve as a cardiology assistant.

The methods we developed are general enough to be useful in other biomedical problems also.

During the course of the project, we worked with neuroscientists and cardiologists, with a very useful exchange of information across computer science and these areas. Several graduate and undergraduate students also received training during the project.


Last Modified: 02/21/2025
Modified by: Ashok Kumar Krishnamurthy

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page