
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | June 6, 2016 |
Latest Amendment Date: | July 22, 2020 |
Award Number: | 1555079 |
Award Instrument: | Continuing Grant |
Program Manager: |
Kenneth Whang
kwhang@nsf.gov (703)292-5149 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | June 1, 2016 |
End Date: | May 31, 2021 (Estimated) |
Total Intended Award Amount: | $502,210.00 |
Total Awarded Amount to Date: | $502,210.00 |
Funds Obligated to Date: |
FY 2017 = $97,431.00 FY 2018 = $100,354.00 FY 2019 = $180,641.00 FY 2020 = $29,190.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
615 W 131ST ST NEW YORK NY US 10027-7922 (212)854-6851 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
New York NY US 10027-6902 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Robust Intelligence |
Primary Program Source: |
01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The recent parallel breakthroughs in deep neural network models and neuroimaging techniques have significantly advanced the current state of artificial and biological computing. However, there has been little interaction between these two disciplines, resulting in simplistic models of neural systems with limited prediction, learning and generalization abilities. The goal of this project is to create a coherent theoretical and mathematical framework to understand the computational role of distinctive features of biological neural networks, their contribution to the formation of robust signal representations, and to model and integrate them into the current artificial neural networks. These new bio-inspired models and algorithms will have adaptive and cognitive abilities, will better predict experimental observations, and will advance the knowledge of how the brain processes speech. In addition, the performance of these models should approach human abilities in tasks mimicking cognitive functions, and will motivate new experiments that can further impose realistic constraints on the models.
This interdisciplinary project lies at the intersection of neurolinguistics, speech engineering, and machine learning, uniting the historically separated disciplines of neuroscience and engineering. The proposed innovative approach integrates methods and expertise across various disciplines, including system identification, signal processing, neurophysiology, and systems neuroscience. The aim of this proposal is to analyze and transform the artificial neural network models to accurately reflect the computational and organizational principles of biological systems through three specific objectives: I) to create analytic methods that can provide insights into the transformations that occur in artificial neural network models by examining their representational properties and feature encoding, II) to model and implement the local, bottom-up, adaptive neural mechanisms that appear ubiquitously in biological systems, and III) to model the top-down, knowledge driven abilities of cognitive systems to implement new computations in response to the task requirements. Accurate computational models of the neural transformations will have an overarching impact in many disciplines including artificial intelligence, neurolinguistics, and systems neuroscience. More realistic neural network models will not only result in human-like pattern recognition technologies and better understanding of how the brain solves speech perception, but can also help explain how these processes are impaired in people with speech and language disorders. Therefore, the proposed project will advance the state-of-the-art in multiple disciplines.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In this research, we aimed to form a better understanding of how artificial neural networks compute and what representation they use. The second objective was to create neurally inspired neural network models and to compare the representational and computational characteristics of biological and artificial neural network models. To achieve these goals, we have formulated a computational framework for learning and interpreting neural network models that can accurately predict the neural responses to sound, in particular the nonlinear transformations that the brain applies to perceive sound. Moreover, we proposed several neurally inspired mechanisms that can be implemented in artificial neural network models to increase their efficacy and robustness. In a complementary approach, we addressed the general source separation problem with novel deep learning frameworks, including the ?attractor network? and ?time-domain audio separation network?. Our proposed model works by first generating a high-dimensional embedding for each time-frequency bin. We then form a reference point (attractor) for each source in the embedding space that pulls all the features belonging to that source toward itself. This method performed particularly well on a standard benchmark used for this task. In addition, we directly addressed several inherent problems of most speech separation algorithms that use spectrograms as their representation. Instead, we proposed a fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. The proposed speech separation algorithm significantly outperforms previous time-frequency methods on both objective and subjective tests, even when compared to the separation quality of several ideal time-frequency masks of the speakers. This research has also enabled the training of several graduate students who gained first-hand knowledge about the brain, and also became familiar with the latest computational modeling approaches and state-of-the-art speech processing methodologies.
Last Modified: 12/16/2021
Modified by: Nima Mesgarani
Please report errors in award information by writing to: awardsearch@nsf.gov.