Award Abstract # 1555079
CAREER: Biologically inspired neural network models for robust speech processing

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK
Initial Amendment Date: June 6, 2016
Latest Amendment Date: July 22, 2020
Award Number: 1555079
Award Instrument: Continuing Grant
Program Manager: Kenneth Whang
kwhang@nsf.gov
 (703)292-5149
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2016
End Date: May 31, 2021 (Estimated)
Total Intended Award Amount: $502,210.00
Total Awarded Amount to Date: $502,210.00
Funds Obligated to Date: FY 2016 = $94,594.00
FY 2017 = $97,431.00

FY 2018 = $100,354.00

FY 2019 = $180,641.00

FY 2020 = $29,190.00
History of Investigator:
  • Nima Mesgarani (Principal Investigator)
    nm2764@columbia.edu
Recipient Sponsored Research Office: Columbia University
615 W 131ST ST
NEW YORK
NY  US  10027-7922
(212)854-6851
Sponsor Congressional District: 13
Primary Place of Performance: Columbia University
New York
NY  US  10027-6902
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): F4N1QNPB95M4
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7495, 8091
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The recent parallel breakthroughs in deep neural network models and neuroimaging techniques have significantly advanced the current state of artificial and biological computing. However, there has been little interaction between these two disciplines, resulting in simplistic models of neural systems with limited prediction, learning and generalization abilities. The goal of this project is to create a coherent theoretical and mathematical framework to understand the computational role of distinctive features of biological neural networks, their contribution to the formation of robust signal representations, and to model and integrate them into the current artificial neural networks. These new bio-inspired models and algorithms will have adaptive and cognitive abilities, will better predict experimental observations, and will advance the knowledge of how the brain processes speech. In addition, the performance of these models should approach human abilities in tasks mimicking cognitive functions, and will motivate new experiments that can further impose realistic constraints on the models.

This interdisciplinary project lies at the intersection of neurolinguistics, speech engineering, and machine learning, uniting the historically separated disciplines of neuroscience and engineering. The proposed innovative approach integrates methods and expertise across various disciplines, including system identification, signal processing, neurophysiology, and systems neuroscience. The aim of this proposal is to analyze and transform the artificial neural network models to accurately reflect the computational and organizational principles of biological systems through three specific objectives: I) to create analytic methods that can provide insights into the transformations that occur in artificial neural network models by examining their representational properties and feature encoding, II) to model and implement the local, bottom-up, adaptive neural mechanisms that appear ubiquitously in biological systems, and III) to model the top-down, knowledge driven abilities of cognitive systems to implement new computations in response to the task requirements. Accurate computational models of the neural transformations will have an overarching impact in many disciplines including artificial intelligence, neurolinguistics, and systems neuroscience. More realistic neural network models will not only result in human-like pattern recognition technologies and better understanding of how the brain solves speech perception, but can also help explain how these processes are impaired in people with speech and language disorders. Therefore, the proposed project will advance the state-of-the-art in multiple disciplines.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 35)
Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N "Towards reconstructing intelligible speech from the human auditory cortex" Sci Rep , 2019 10.1038/s41598-018-37359-z
Akbari, H., Khalighinejad, B., Herrero, J., Mehta, A., Mesgarani, N "Towards reconstructing intelligible speech from the human auditory cortex" Scientific Reports , v.9 , 2019
Chen, Z., Luo, Y., Mesgarani, N "Deep attractor network for single-microphone speech separation" in Proc. IEEE Int. Conf. Acoust. Speech and Signal Process , 2017
Cong Han and Yi Luo and Nima Mesgarani "Real-Time Binaural Speech Separation with Preserved Spatial Cues" ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2020 , p.6404-6408 10.1109/ICASSP40776.2020.9053215
Han, C., Luo, Y. and Mesgarani, N "Real-time binaural speech separation with preserved spatial cues" IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2019
Han. C., Lu, Y., Mesgarani, N., "Online deep attractor network for real-time single-channel speech separation" IEEE Int. Conf. Acoustic, Speech, and Signal Process , 2019
Han C, O'Sullivan J, Luo Y, Herrero J, Mehta AD, Mesgarani N "Speaker-independent auditory attention decoding without access to clean speech sources" Sci Adv , 2019 10.1126/sciadv.aav6134
Han, C., OSullivan, J., Lu, Y., Herrero, J., Mehta, A., Mesgarani, N., "Speaker independent auditory attention decoding without access to clean sources" Science Advances , v.5 , 2019
Hassan Akbari and Himani Arora and Liangliang Cao and Nima Mesgarani "Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video" 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2018 , p.2516-2520 10.1109/ICASSP.2018.8461856
Hassan Akbari, Himani Arora, Ling Cao, Nima Mesgarani "Lip2AudSpec: Speech reconstruction from silent lip movements video" International IEEE Conference on Audio, Speech, and Signal Processing , 2018
Keshishian, M., Akbari, H., Khalighinejad, B., Herrero, J.L., Mehta, A.D. and Mesgarani, N. "Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models" eLife , v.9 , 2020
(Showing: 1 - 10 of 35)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this research, we aimed to form a better understanding of how artificial neural networks compute and what representation they use. The second objective was to create neurally inspired neural network models and to compare the representational and computational characteristics of biological and artificial neural network models. To achieve these goals, we have formulated a computational framework for learning and interpreting neural network models that can accurately predict the neural responses to sound, in particular the nonlinear transformations that the brain applies to perceive sound. Moreover, we proposed several neurally inspired mechanisms that can be implemented in artificial neural network models to increase their efficacy and robustness. In a complementary approach, we addressed the general source separation problem with novel deep learning frameworks, including the ?attractor network? and ?time-domain audio separation network?. Our proposed model works by first generating a high-dimensional embedding for each time-frequency bin. We then form a reference point (attractor) for each source in the embedding space that pulls all the features belonging to that source toward itself. This method performed particularly well on a standard benchmark used for this task. In addition, we directly addressed several inherent problems of most speech separation algorithms that use spectrograms as their representation. Instead, we proposed a fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. The proposed speech separation algorithm significantly outperforms previous time-frequency methods on both objective and subjective tests, even when compared to the separation quality of several ideal time-frequency masks of the speakers. This research has also enabled the training of several graduate students who gained first-hand knowledge about the brain, and also became familiar with the latest computational modeling approaches and state-of-the-art speech processing methodologies.


Last Modified: 12/16/2021
Modified by: Nima Mesgarani

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page