Award Abstract # 2219843
Enhancing research on speech and deep learning through holistic acoustic analysis

NSF Org: DRL
Division of Research on Learning in Formal and Informal Settings (DRL)
Recipient: NORTHWESTERN UNIVERSITY
Initial Amendment Date: August 4, 2022
Latest Amendment Date: August 4, 2022
Award Number: 2219843
Award Instrument: Standard Grant
Program Manager: Gregg Solomon
gesolomo@nsf.gov
 (703)292-8333
DRL
 Division of Research on Learning in Formal and Informal Settings (DRL)
EDU
 Directorate for STEM Education
Start Date: August 15, 2022
End Date: July 31, 2026 (Estimated)
Total Intended Award Amount: $1,000,000.00
Total Awarded Amount to Date: $1,000,000.00
Funds Obligated to Date: FY 2022 = $1,000,000.00
History of Investigator:
  • Matthew Goldrick (Principal Investigator)
    matt-goldrick@northwestern.edu
Recipient Sponsored Research Office: Northwestern University
633 CLARK ST
EVANSTON
IL  US  60208-0001
(312)503-7955
Sponsor Congressional District: 09
Primary Place of Performance: Northwestern University
2016 Sheridan Rd.
Evanston
IL  US  60208-4090
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): EXZVPWZBLUE8
Parent UEI:
NSF Program(s): IntgStrat Undst Neurl&Cogn Sys,
ECR-EDU Core Research
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
04002223DB NSF Education & Human Resource
Program Reference Code(s): 014Z, 8089, 8091, 8551, 8817
Program Element Code(s): 862400, 798000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075, 47.076

ABSTRACT

You can guess a lot about a person from the way they pronounce words. Remarkably, human listeners can tell if it is likely that talkers learned English as a first language or a second language, or if the talkers might have a brain injury that makes it difficult for them to speak. Such intuitions rely on human listeners? holistic pattern recognition abilities; these allow us to perceive the important, meaningful, yet subtle differences between pronunciations. However, the methods scientists currently use to measure speech objectively ? based on a small number of properties of speech sounds ? fail to capture these differences, hampering our ability to use speech to learn about the mind and brain. This project brings together speech scientists, computer scientists, and neuroscientists to test a radically different approach to this problem. Machine learning will be used to discover a new method for quantifying differences between spoken utterances based on holistic pattern recognition. This will be tested against new and existing data from bilingual speakers. If successful, this will yield a fully general method that can be applied to speech from any language or any domain of language usage, allowing scientists to capitalize on the wealth of information in speech to develop powerful new insights into the mind and brain. Improved detection of subtle problems with pronunciation, such as occurs with Alzheimer?s disease, will advance our understanding of the brain mechanisms that humans use to produce speech. The results of this testing will also allow computer scientists to advance our understanding of how machine learning algorithms process sounds, driving improvements in the algorithms and supporting applications in any area of speech and language technology that relies on spoken language processing.

Speech variability across talkers provides a treasure trove of information for cognitive neuroscientists, leading to important insights into the cognitive mechanisms underlying language processing and potentially providing early signs of brain dysfunction. Current studies of speech are hamstrung by analyses that require preselecting specific temporal scales and acoustic dimensions. We propose a radically different approach: using unsupervised deep learning to discover a representational space for analysis of acoustic variation. To test this highly general approach, this method will be compared to current state-of-the art methods for analyzing individual variation in bilingual speech. This includes using the acoustic variation in second language speech to predict intelligibility and to detect difficulties in code-switching, particularly the challenges faced by individuals with Alzheimer?s Disease. The results will inform development of deep learning and cognitive neuroscience. The machine learning algorithm is fully general; it can be applied to speech from any language or any domain of language usage, expanding the range of populations and contexts that can be served by speech technology or studied by cognitive neuroscientists. The project?s integrative approach will allow computer scientists to advance our understanding of the extent to which modern deep learning architectures do or do not approximate human speech processing and allow cognitive neuroscientists to further our understanding of how meaningful acoustic distinctions are represented in speech perception and production.
human speech representation.

This project is funded by the Integrative Strategies for Understanding Neural and Cognitive Systems (NCS) program, which is jointly supported by the Directorates for Computer and Information Science and Engineering (CISE), Education and Human Resources (EHR), Engineering (ENG), and Social, Behavioral, and Economic Sciences (SBE).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chernyak, Bronya R and Bradlow, Ann R and Keshet, Joseph and Goldrick, Matthew "A perceptual similarity space for speech based on self-supervised speech representations" The Journal of the Acoustical Society of America , v.155 , 2024 https://doi.org/10.1121/10.0026358 Citation Details
Goldrick, Matthew and Cole, Jennifer "Advancement of phonetics in the 21st century: Exemplar models of speech production" Journal of Phonetics , v.99 , 2023 Citation Details
Goldrick, Matthew and Gollan, Tamar H. "Inhibitory control of the dominant language: Reversed language dominance is the tip of the iceberg" Journal of Memory and Language , v.130 , 2023 https://doi.org/10.1016/j.jml.2023.104410 Citation Details
Kim, Seung-Eun and Chernyak, Bronya_R and Seleznova, Olga and Keshet, Joseph and Goldrick, Matthew and Bradlow, Ann_R "Automatic recognition of second language speech-in-noise" JASA Express Letters , v.4 , 2024 https://doi.org/10.1121/10.0024877 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page