NSF Award Search: Award # 1551866 - CompCog: The edge of the lexicon: Productive knowledge and direct experience in the acquisition and processing of multiword expressions

Award Abstract # 1551866

CompCog: The edge of the lexicon: Productive knowledge and direct experience in the acquisition and processing of multiword expressions

NSF Org:	BCS Division of Behavioral and Cognitive Sciences
Recipient:	MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Initial Amendment Date:	August 22, 2016
Latest Amendment Date:	August 22, 2016
Award Number:	1551866
Award Instrument:	Standard Grant
Program Manager:	Tyler Kendall BCS Division of Behavioral and Cognitive Sciences SBE Directorate for Social, Behavioral and Economic Sciences
Start Date:	August 15, 2016
End Date:	January 31, 2021 (Estimated)
Total Intended Award Amount:	$329,233.00
Total Awarded Amount to Date:	$329,233.00
Funds Obligated to Date:	FY 2016 = $329,233.00
History of Investigator:	Roger Levy (Principal Investigator) rplevy@mit.edu
Recipient Sponsored Research Office:	Massachusetts Institute of Technology 77 MASSACHUSETTS AVE CAMBRIDGE MA US 02139-4301 (617)253-1000
Sponsor Congressional District:	07
Primary Place of Performance:	Massachusetts Institute of Technology 77 Massachusetts Ave Cambridge MA US 02139-4301
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	E2NYLCDML6V1
Parent UEI:	E2NYLCDML6V1
NSF Program(s):	Linguistics, Perception, Action & Cognition, Robust Intelligence
Primary Program Source:	01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1311, 7252, 7495, 9179
Program Element Code(s):	131100, 725200, 749500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.075

ABSTRACT

Language is the most discrete, measurable cultural record of the human mind, and is uniquely expressive among the communicative systems found in nature. Every day we comprehend hundreds of sentences that we hear or read but have never encountered before, and we produce hundreds more. Yet our success at these many acts of communication belies the difficulty of the task: language is rife with ambiguity, our attention is limited, our environments may be noisy, and we often have incomplete information about the shared knowledge and beliefs of the people we engage with. This ability, unique to our species, poses profound challenges for our scientific understanding of the capabilities of the human mind. Deepening our understanding of these capabilities requires a combination of ideas and methods from linguistics, psychology, and computer science. Advances in this area help lay the groundwork for improvements in natural language technologies such as document summarization, paraphrasing, question answering, and machine translation, and in better identification, diagnosis, and treatment of language disorders.

Within this broader research enterprise, this project focuses on the "edge of the lexicon", elucidating the conditions under which a linguistic expression begins to get stored in the mind of the native speaker who uses it, and the consequences of the expression being stored as a holistic unit. Native speakers know both productive rules that license and allow interpretation of phrases and sentences that they have never before encountered and a rich inventory of lexical items that can be combined through these productive rules. Many of these lexical items are individual words, but there is evidence that specific, frequent multi-word expressions, such as "meat and potatoes" or "large majority" may also get stored in the lexicon. This project combines artificial intelligence-based computational models, large linguistic datasets, and controlled psychological experimentation to explore the edge of the lexicon, probing how direct experience with specific multi-word expressions leads to their being stored in one's mental lexicon, how such storage is reconciled with productive knowledge in language comprehension and production, and how these expressions emerge and change over time.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Veronica Boyce and Richard Futrell and Roger Levy "Maze Made Easy: Better and easier measurement of incremental processing difficulty" Journal of Memory and Language , v.111 , 2020 , p.1--13

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project involved an in-depth study of the knowledge contained in and deployed from the lexicon: the repository of words and other elementary units that speakers, listeners, readers, and writers must learn and use in order to function in the language. Although one might think that a speaker's mental lexicon is simply a listing of words together with the meanings, it is much richer: words have relationships with each other and are deeply intertwined with the grammar of the language that governs how individual words and other elementary units are used together to encode meaning. Additionally, there are rich co-occurrence patterns among words, yielding multi-word expressions. For example, "out and about" is much more common and natural sounding than "out and around"; likewise, "salt and pepper" compared with "pepper and salt", which must be learned and to which enrichments of meaning can associate. Furthermore, it turns out that not all words and multi-word expressions are represented in the same way. Words and expressions with which a speaker has extensive experience seem to have particularized information represented more robustly in its own right; words and expressions with which a speaker has less experience seem to have their representations more reliant on other parts of the language.

In this project we conducted numerous experiments on child and adult native speakers to gain insight into the learning and processing at the "edge of the lexicon", where an expression's particularized information and relations with the rest of the language are revealed most clearly. These studies involved not only English but also Mandarin Chinese and also gestural (pre-sign) systems, which allowed us to test hypotheses about learning biases and architectural constraints on language production that would not have been possible using English alone. As part of this project we also developed a new experimental psycholinguistics method, the automated "Maze" task, that has advantages over existing methods for web-based experimental delivery, which helps us reach diverse populations and helps facilitate psycholinguistic research when in-person experiments may not be feasible. Finally, the project yielded a set of psycholinguistic experimental results that have helped differentiate between major theories of human language production – that is, what factors are the key influences in determining the moment-to-moment choices speakers make in converting their thoughts to spoken words.

This project offered numerous training and professional development experiences to postdoctoral researchers, graduate students, and undergraduate researchers, including supporting a large part of one recent PhD graduate's research.

Last Modified: 06/01/2021
Modified by: Roger P Levy

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error