
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 7, 2017 |
Latest Amendment Date: | June 30, 2022 |
Award Number: | 1734938 |
Award Instrument: | Standard Grant |
Program Manager: |
Kenneth Whang
kwhang@nsf.gov (703)292-5149 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 15, 2017 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $1,000,000.00 |
Total Awarded Amount to Date: | $1,000,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
465 Northwestern Ave. West Lafayette IN US 47907-2035 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
GVF - Global Venture Fund, ECR-EDU Core Research, IntgStrat Undst Neurl&Cogn Sys |
Primary Program Source: |
04001718DB NSF Education & Human Resource |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
It is often said that a picture is worth a thousand words. Frequently, to search for what is needed, whether images or objects in those images, words are needed instead. Getting accurate labels for efficient searches is a longstanding goal of computer vision, but progress has been slow. This project employs new methods to significantly change how picture-word labeling is accomplished by taking advantage of the best picture recognizer available, the human brain. Through functional magnetic resonance imaging and electroencephalography, brain activity of humans looking at pictures/videos is recorded and then used to improve performance on artificial intelligence (AI) tasks involving computer vision and natural language processing. Current systems use machine learning to train computers to recognize objects (nouns) and activities (verbs) in images/video, which are then used to describe events. Reasoning tasks (e.g., solving math problems) can then be done. These systems are trained on specially prepared datasets with samples of nouns for objects, verbs for activities, sentences describing events, and exam questions and answers. A novel paradigm using humans to perform the same tasks while their brains are scanned allows determination of neural patterns associated with those tasks. The brain activity patterns, in turn, are used to train better computer systems.
The central hypothesis is that understanding human processing of grounded language involving predication and its use during reasoning will materially improve engineered computer vision, natural language processing, and AI systems that perform image/video captioning, visual question answering, and problem solving. Scientific and engineering goals include developing models of human language grounding and reasoning consistent with neuroimaging, to improve engineered systems integrating language and vision that support automated reasoning. The main scientific question is to understand mechanisms by which predicates and arguments are identified, linked, and used for reasoning by the human brain. The hypothesis, that predicate-argument linking in visual and linguistic representations are accomplished similarly, and that this then supports reasoning and problem solving, will be tested using multiple neuroimaging modalities, and machine learning algorithms to decode "who did what to whom" from brain scans of subjects processing linguistic and visual stimuli. The iterative approach will involve understanding information integration at the neural level, to improve machine learning performance on AI tasks by training computers to perform increasingly complex tasks with neuroimaging data from stimuli derived from large-scale natural tasks. Using identical datasets for human and machine performance will support translation of scientific advances to engineering practice involving integration of computer vision and natural language processing.
This award is cofunded by the Office of International Science and Engineering.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
We studied how the human brain processes and represents people and objects (nouns) and activity (verbs) across different modalities, including vision (video) and language (both spoken and written). We particularly studied how that brain activity allows forming compound concepts by combining simpler concepts like nouns and verbs into sentence structure (subject-verb-object). We developed methods for computer programs to decode both simple and compound concepts from neuroimaging data (EEG and fMRI) recorded from subjects viewing short video clips, hearing spoken presentation of sentences, and reading sentences, all depicting compound concepts. Our methods demonstrate two key findings: first, the pattern of brain activity recorded from the same concepts (stimuli) is similar, but not identical, across different people, and second, the pattern of brain activity recordied from the same concepts (stimuli) is similar, but not identical, across different modalities. The pattern of brain activity associated with combining simple concepts into compound concepts is also similar, but not identical, across different people and modalities.
In another major study, we discovered flaws in the experimental method used to collect data in a large number of prominent published papers. This flaw means their conclusions and claims are not valid (and cannot be used). We published our findings is the same places so that others would be aware of the problem.
Last Modified: 01/27/2024
Modified by: Jeffrey M Siskind
Please report errors in award information by writing to: awardsearch@nsf.gov.