Award Abstract # 1457992
CompCog: Modeling syntactic priming in language production according to corpus data

NSF Org: BCS
Division of Behavioral and Cognitive Sciences
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: June 2, 2015
Latest Amendment Date: June 2, 2015
Award Number: 1457992
Award Instrument: Standard Grant
Program Manager: William Badecker
BCS
 Division of Behavioral and Cognitive Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: July 15, 2015
End Date: December 31, 2016 (Estimated)
Total Intended Award Amount: $75,000.00
Total Awarded Amount to Date: $75,000.00
Funds Obligated to Date: FY 2015 = $75,000.00
History of Investigator:
  • David Reitter (Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
316D IST Building
University Park
PA  US  16802-6823
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Linguistics
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1311, 9179
Program Element Code(s): 131100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

How do humans learn, understand, and produce language? What are the mental representations necessary to compose natural-sounding sentences? Addressing such fundamental questions from cognitive linguistics is key to improving second-language learning in a nation that is becoming increasingly linguistically diverse, and to develop better natural-language computer interfaces. This project follows a big-data approach in that it looks at adaptation between speakers and authors in recorded conversations and large text databases in order to infer mental representations. It follows the basic idea that adaptation indicates the presence of such mental structures. With this methodology, the researchers will use large text datasets as a keyhole into the human mind. They will create an unbiased and largely automatic way to evaluate computational models that describe how the mind achieves fast, fluent, near-perfect language production.

The goal of the project is to develop a psycholinguistic, computational model that spells out precisely the steps and representations necessary for language production. The models can be compared and improved incrementally because they are tested on large-scale language data. This project will develop a cognitive model to describe alignment in language production as found in natural dialogue in speech corpora. As a basis for alignment at the structural level, it will explain and predict syntactic "priming" effects: E.g., "The linguist gave the lab keys to his student" primes a listener to mirror the sentence structure with "The student showed his results to the editor" (target), rather than "... showed the editor his results". The model will simulate language production with general cognitive operations studied by cognitive psychology, such as cue-based memory retrieval. It will account for key characteristics of priming, including rapid decay, long-term persistence and convergence, lexical boost effects, and interference sensitivity to intervening sentences. The model will be based on a cognitive framework, ACT-R, thereby integrating language processing with general quantitative and computational accounts of memory. A broad-coverage, lexicalized syntax formalism is used to account for real-life language data.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this exploratory project, we trialled a way to use large-scale datasets to model how language is processed in the brain.  It brings together several scientific fields: computational linguistics, which has developed very useful computer algorithms and representations that describe the structure of sentences very successfully; computational psychology, which provides models that formalize what we know about the neurobiology of the brain and the psychology of cognition; data science, which has studied how large-scale datasets can be analyzed and made useful for many scientific fields.  This project combined knowledge from all of these areas and implemented computer programs that simulate the process of forming an English-language sentence.  They were optimized and tested on several thousands of sentences from a dataset that contains recorded and transcribed, spoken conversations between people connected by nothing but a phone line.  The programs created under the project were evaluated according to what we can learn from them in terms of the details of this process, chiefly how it uses working memory and other forms of memory. Understanding this is very important to, for example, do a better job teaching foreign languages, or helping someone recover from a stroke or other brain damage.  Of course, this is only the beginning to a process that may soon revolutionize the way we do science that studied the psychology of language.  Rather than examining individual constructions in the way speakers form sentences in different conditions, we can, in the future, use computer models to study millions of sentences produced by speakers in many languages to systematically characterize how language works, and how we teach it better.


Last Modified: 03/30/2017
Modified by: David T Reitter

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page