Award Abstract # 2132724
RI: Small: SM-An Active Approach for Data Engineering to Improve Vision-Language Tasks

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: ARIZONA STATE UNIVERSITY
Initial Amendment Date: August 26, 2021
Latest Amendment Date: September 5, 2023
Award Number: 2132724
Award Instrument: Continuing Grant
Program Manager: Jie Yang
jyang@nsf.gov
 (703)292-4768
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2022
End Date: March 31, 2026 (Estimated)
Total Intended Award Amount: $499,903.00
Total Awarded Amount to Date: $515,903.00
Funds Obligated to Date: FY 2021 = $159,256.00
FY 2023 = $356,647.00
History of Investigator:
  • Yezhou Yang (Principal Investigator)
    yz.yang@asu.edu
  • Chitta Baral (Co-Principal Investigator)
Recipient Sponsored Research Office: Arizona State University
660 S MILL AVENUE STE 204
TEMPE
AZ  US  85281-3670
(480)965-5479
Sponsor Congressional District: 04
Primary Place of Performance: Arizona State University
PO Box 876011
Tempe
AZ  US  85281-6011
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NTLHJXM55KZ6
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT

01002324DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7495, 7923, 9251
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Intelligent systems that can robustly process vision and language data are necessary to enable integrated AI applications (such as automated driving, robotic home assistant, etc.) and improve quality of life. However, such systems typically operate in open and highly uncertain environments for which physical and geometric understanding, semantic robustness, and conducting hypothetical reasoning become essential. This project will result in a publicly available software suite that can assist with training and validating robust Vision and Language (V&L) systems. In particular, the resulting semantic transformations will be packaged as an API service that companies and universities could quickly utilize. The resulting benchmark challenges will be made publicly available for further V&L research. Finally, the proposed study will stimulate educational activities at ASU in training graduate and undergraduate students in AI/ML/CV/NLP with a "post-dataset era'" vision. The project will also train 2 Ph.D. students and several master-with-thesis students, develop a new seminar course, recruit underrepresented minority participants at all levels, and reach K-12 students with modules that explain the challenges in developing robust intelligent systems.

Robust intelligent systems such as home assistant robots fundamentally depend on highly correlated vision and language systems and fine-grained data alignment. Even though the existing approaches demonstrate success on carefully collected benchmarks, it is not sufficient to establish robustness, reliability, and out-of-distribution generalization for them to be deployed in real-world applications. The project will conduct a systematic study on intelligent and active data engineering to boost their performance and robustness. By investigating a novel and active perspective towards vision and language data engineering, the project will address the following three fundamental research tasks: 1) development of data generators to hallucinate training data from existing ones with low-level vision; 2) with hypothetical actions, and 3) design of training paradigms incorporating the new data generated with the goal of increasing the ultimate systems' generalization capability and robustness.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chatterjee, Agneet and Gokhale, Tejas and Baral, Chitta and Yang, Yezhou "On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation" , 2024 https://doi.org/10.1109/CVPR52733.2024.00270 Citation Details
Gokhale, Tejas and Chaudhary, Abhishek and Banerjee, Pratyay and Baral, Chitta and Yang, Yezhou "Semantically Distributed Robust Optimization for Vision-and-Language Inference" ACL 2022 Findings , 2022 https://doi.org/10.18653/v1/2022.findings-acl.118 Citation Details
Maitreya Patel, Tejas Gokhale "CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering" Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2022 Citation Details
Patel, Maitreya and Gokhale, Tejas and Baral, Chitta and Yang, Yezhou "ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models" Proceedings of the AAAI Conference on Artificial Intelligence , v.38 , 2024 https://doi.org/10.1609/aaai.v38i13.29371 Citation Details
Patel, Maitreya and Kim, Changhoon and Cheng, Sheng and Baral, Chitta and Yang, Yezhou "ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations" , 2024 https://doi.org/10.1109/CVPR52733.2024.00866 Citation Details
Tejas Gokhale, Rushil Anirudh "Improving Diversity With Adversarially Learned Transformations for Domain Generalization" Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023 , 2023 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page