
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 1, 2014 |
Latest Amendment Date: | June 4, 2015 |
Award Number: | 1420971 |
Award Instrument: | Standard Grant |
Program Manager: |
Ephraim Glinert
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2014 |
End Date: | September 30, 2018 (Estimated) |
Total Intended Award Amount: | $498,736.00 |
Total Awarded Amount to Date: | $514,261.00 |
Funds Obligated to Date: |
FY 2015 = $15,525.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
633 CLARK ST EVANSTON IL US 60208-0001 (312)503-7955 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2145 Sheridan Road, Tech Evanston IL US 60208-3109 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | HCC-Human-Centered Computing |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Algorithms to separate audio sources have many potential uses such as to extract important audio data from historic recordings or to help people with hearing impairments select what to amplify and what to suppress in their hearing aids. Computer processing of audio content can potentially be used to isolate the sound sources of interest and to improve the audio clarity any time that the content exhibits interference from multiple sound sources, such as to extract a single voice of interest from a room full of voices. However, current sound source identification and separation methods are only reliable when there is a single predominant sound. This project will develop the science and technology that is needed to more easily isolate a single sound source from audio content with multiple competing sources, and that is needed to build interactive computer systems that will guide users though an interactive source separation process, to permit the separation and recombining of sound sources in a manner that is beyond the reach of existing audio software. The outcomes of the project will improve the possibility of speech recognition in environments with multiple talkers, will be useful for many scientific inquiries such as in biodiversity monitoring through the automated analysis of field recordings, and will be broadly useful any time that manual tagging of audio data is not practical.
While many computational auditory scene analysis algorithms have been proposed to separate audio scenes into individual sources, current methods are brittle and difficult to use and as a result have not been broadly adopted by potential users. The methods are brittle in that each algorithm relies on a single cue to separate sources and if the cue is not reliable then the method fails. The methods are difficult to use because the algorithms cannot predict which audio scenes any specific algorithm is likely to work on, and so the user does not know which method to apply in any given case. They are also difficult to use because their control parameters are hard to understand for users who lack expertise in signal processing. This project will research how to integrate multiple source separation algorithms into a single framework, and how to improve the ease of use by exploring interfaces that permit users to interactively define what they wish to isolate in audio scenes, and that permit systems to provide users with guidance on selecting a tool and setting the necessary parameters. The project will produce an open-source audio source separation tool that embodies these scientific research outcomes.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Overview: A fundamental problem in Computer Audition is that of audio source separation. This is the process of extracting elements of interest (like individual voices) from an audio scene (like a cocktail party). While many algorithms have been proposed to separate audio scenes into individual sources, these methods are brittle and difficult to use. Because of this, potential users have not broadly adopted the technology. Audio source separation methods are brittle because each algorithm relies on a single cue to separate sources. When the cue is not reliable, the method fails. Methods are difficult to use because algorithms cannot predict which audio scenes they are likely to work on. Therefore, the user does not know which method to apply in any given case. They are also difficult to use because their control parameters are hard to understand for those not expert in signal processing.
In this work we addressed algorithm brittleness by developing methods to integrate multiple source separation algorithms into a single framework. We also developed new interfaces that let the user easily and interactively define what they wish to separate from the audio scene. We also developed methods for computer algorithms to automatically learn evaluation measures for audio quality so that a system could provide guidance to the user on tool selection and parameter settings.
The outcomes of this research were: (1) Audio source separation algorithms that adaptively combine multiple cues to robustly separate sounds in cases where single-cue approaches fail; (2) Interfaces that let the user guide the separation process towards a goal, without having to understand the complex internals of the algorithms; (3) Methods to automatically learn evaluation measures from past user interactions so systems can suggest approaches and settings likely to work on the current interaction; and (4) Open-source audio source separation tools that embodies these outcomes.
This work is intellectually transformative in bringing together techniques from the disparate fields signal processing and human computer interaction in an integrated and synergistic manner. The work advanced knowledge by developing a new unified framework for multi-approach source separation, as well as new approaches to deeply integrate cutting-edge signal processing and interactive interfaces that learn from users. This is of interest to researchers in signal processing, artificial intelligence, speech recognition, multimedia processing, and human computer interaction.
This work has broad impact because current sound tagging methods are only reliable where there is a single predominant sound. Robust source separation can facilitate segmentation of audio into meaningful chunks to be recognized and transcribed with existing techniques. This will be transformative for speech recognition in environments with multiple talkers, transformative for biodiversity monitoring (automatic ID of species in field recordings with multiple concurrent species) and transformative for search through existing audio/video collections where manual tagging of the data is not practical.
Algorithms to separation audio sources could be used to remix existing legacy audio content, upmixing stereo to surround sound, or helping the hearing-impaired select what to amplify and what to suppress in audio. Musicians could remix and edit recordings without need for an individual microphone on each musician. More broadly, source separation algorithms can be applied anywhere signals exhibit interference from multiple sources, including biomedical imaging and telecommunications.
The work funded by this grant has resulted in 15 peer-reviewed publications presented in internationally-respected journals and conferences. These publications can be found on the Northwestern University Interactive Audio Lab website: music.cs.northwestern.edu.
In addition to publications, this grant has funded a number of software products. The reader can find our software on the GitHub online open source software repository (www.github.com). The three primary repositories are Web Unmixing Toolbox (WUT), The Northwestern University Source Separation Library (nussl), and the Crowdsourced Audio Quality Evaluation (CAQE) Toolkit.
The Web Unmixing Toolbox (WUT) is an open source, browser-based, interactive source separation application for end users. This is the first application that lets the end user edit the audio using a two dimensional Fourier transform of the spectrogram.
The Northwestern University Source Separation Library (nussl) is a flexible,object oriented python audio source separation library created by the PI?s lab. It provides implementations of common source separation algorithms as well as an easy-to-use framework for prototyping and adding new algorithms.
The Crowdsourced Audio Quality Evaluation (CAQE) Toolkit is a software package thatenables researchers to easily run perceptual audio quality evaluations over the web.
Last Modified: 12/30/2018
Modified by: Bryan A Pardo
Please report errors in award information by writing to: awardsearch@nsf.gov.