Award Abstract # 1751206
CAREER: Weakly-Supervised Visual Scene Understanding: Combining Images and Videos, and Going Beyond Semantic Tags

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA, DAVIS
Initial Amendment Date: March 16, 2018
Latest Amendment Date: April 19, 2021
Award Number: 1751206
Award Instrument: Continuing Grant
Program Manager: Jie Yang
jyang@nsf.gov
 (703)292-4768
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2018
End Date: October 31, 2021 (Estimated)
Total Intended Award Amount: $500,499.00
Total Awarded Amount to Date: $470,120.00
Funds Obligated to Date: FY 2018 = $97,993.00
FY 2019 = $77,847.00

FY 2020 = $0.00

FY 2021 = $0.00
History of Investigator:
  • Yong Jae Lee (Principal Investigator)
    yongjaelee@cs.wisc.edu
Recipient Sponsored Research Office: University of California-Davis
1850 RESEARCH PARK DR STE 300
DAVIS
CA  US  95618-6153
(530)754-7700
Sponsor Congressional District: 04
Primary Place of Performance: University of California-Davis
2063 Kemper Hall
Davis
CA  US  95616-5270
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): TX2DAGQPENZ5
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 9251, 7495
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The internet provides an endless supply of images and videos, replete with weakly-annotated meta-data such as text tags, GPS coordinates, timestamps, or social media sentiments. This huge resource of visual data provides an opportunity to create scalable and powerful recognition algorithms that do not depend on expensive human annotations. The research component of this project develops novel visual scene understanding algorithms that can effectively learn from such weakly-annotated visual data. The main novelty is to combine both images and videos together. The developed algorithms could have broad impact in numerous fields including AI, security, and agricultural sciences. In addition to scientific impact, the project performs complementary educational and outreach activities. Specifically, it provides mentorship to high school, undergraduate, and graduate students, teaches new undergraduate and graduate computer vision courses that have been lacking at UC Davis, and organizes an international workshop on weakly-supervised visual scene understanding.

This project develops novel algorithms to advance weakly-supervised visual scene understanding in two complementary ways: (1) learning jointly with both images and videos to take advantage of their complementarity, and (2) learning from weak supervisory signals that go beyond standard semantic tags such as timestamps, captions, and relative comparisons. Specifically, it investigates novel approaches to advance tasks like fully-automatic video object segmentation, weakly-supervised object detection, unsupervised learning of object categories, and mining of localized patterns in the image/video data that are correlated with the weak supervisory signal. Throughout, the project explores ways to understand and mitigate noise in the weak labels and to overcome the domain differences between images and videos.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)
Xiao, Fanyi and Lee, Yong Jae "Video Object Detection with an Aligned Spatial-Temporal Memory" Proceedings of the European Conference on Computer Vision (ECCV) , 2018 Citation Details
Xiao, Fanyi and Liu, Haotian and Lee, Yong Jae "Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos" Proceedings of the IEEE International Conference on Computer Vision (ICCV) , 2019 10.1109/ICCV.2019.00711 Citation Details
Zhou, Mingyang and Cheng, Runxiang and Lee, Yong Jae and Yu, Zhou "A Visual Attention Grounding Neural Model for Multimodal Machine Translation" Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2018 Citation Details
Zou, Xueyan and Xiao, Fanyi and Yu, Zhiding and Lee, Yong Jae "Delving Deeper into Anti-aliasing in ConvNets" BMVC , 2020 https://doi.org/ Citation Details
Bolya, Daniel and Zhou, Chong and Xiao, Fanyi and Lee, Yong Jae "YOLACT++: Better Real-time Instance Segmentation" IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 https://doi.org/10.1109/TPAMI.2020.3014297 Citation Details
Bolya, Daniel and Zhou, Chong and Xiao, Fanyi and Lee, Yong Jae "YOLACT: Real-Time Instance Segmentation" Proceedings of the IEEE International Conference on Computer Vision (ICCV) , 2019 10.1109/ICCV.2019.00925 Citation Details
Gu, Xiuye and Luo, Weixin and Ryoo, Michael and Lee, Yong Jae "Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers" ECCV 2020 , 2020 https://doi.org/10.1007/978-3-030-58592-1_43 Citation Details
Li, Yuheng and Singh, Krishna Kumar and Ojha, Utkarsh and Lee, Yong Jae "MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation" IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2020 https://doi.org/10.1109/CVPR42600.2020.00806 Citation Details
Ojha, Utkarsh and Singh, Krishna Kumar and Hsieh, Cho-Jui and Lee, Yong Jae "Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data" NeurIPS , 2020 https://doi.org/ Citation Details
Ren, Zhongzheng and Yu, Zhiding and Yang, Xiaodong and Liu, Ming-Yu and Lee, Yong Jae and Schwing, Alexander G. and Kautz, Jan "Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection" 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2020 https://doi.org/10.1109/CVPR42600.2020.01061 Citation Details
Singh, K. K. and Mahajan, D. and Grauman, K. and Lee, Y. J. and Feiszli, M. and Ghadiyaram, D. "Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2020 https://doi.org/10.1109/CVPR42600.2020.01108 Citation Details
(Showing: 1 - 10 of 14)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page