NSF Award Search: Award # 1116384

Award Abstract # 1116384

HCC: Small: Building Audio Interfaces with Crowdsourced Concept Maps and Active Transfer Learning

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	NORTHWESTERN UNIVERSITY
Initial Amendment Date:	August 16, 2011
Latest Amendment Date:	August 16, 2011
Award Number:	1116384
Award Instrument:	Standard Grant
Program Manager:	Ephraim Glinert IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2011
End Date:	August 31, 2016 (Estimated)
Total Intended Award Amount:	$499,804.00
Total Awarded Amount to Date:	$499,804.00
Funds Obligated to Date:	FY 2011 = $499,804.00
History of Investigator:	Bryan Pardo (Principal Investigator) pardo@northwestern.edu Darren Gergle (Co-Principal Investigator)
Recipient Sponsored Research Office:	Northwestern University 633 CLARK ST EVANSTON IL US 60208-0001 (312)503-7955
Sponsor Congressional District:	09
Primary Place of Performance:	Northwestern University 633 CLARK ST EVANSTON IL US 60208-0001
Primary Place of Performance Congressional District:	09
Unique Entity Identifier (UEI):	EXZVPWZBLUE8
Parent UEI:
NSF Program(s):	HCC-Human-Centered Computing
Primary Program Source:	01001112DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7367, 7923
Program Element Code(s):	736700
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The United States is a world-leader in software and in multimedia content (e.g. music, film). To remain so, we must continually raise the bar in both software and media production. Software tools for media production (e.g. the audio production suite Protools) often have complex interfaces, conceptualized in ways that makes it difficult for any but the most expert to realize the power of these tools. Complex interfaces and steep learning curves can discourage creative people from doing their best work with such tools. Here, we focus on audio production tools. We propose a user-centered approach to remove the great disconnect between existing audio production tools and the conceptual frameworks within which many people work, both expert musicians and the broader public. The tools we develop will automatically adapt to the user's conceptual framework, rather than forcing the user to adapt to the tools. Where appropriate, the tools will speed and enhance their adaptation using active learning informed by interaction with previous users (transfer learning). The tools will also automatically build a crowdsourced audio concept map. This will help provide facilities for computer-aided, directed learning, so that tool users can expand their conceptual frameworks and abilities. By letting people manipulate audio on their own terms and enhancing their knowledge of such tools with directed learning, we expect to transform the interaction experience, making the computer a device that supports and enhances creativity, rather than an obstacle.

This work will have a number of broader impacts. The tools developed will be directly usable by practicing musicians and will also facilitate learning and creativity for the general public. These techniques will also be applicable to personalization of hearing aids and new diagnostic systems for audiologists. Our approach to tool personalization is core work in human-computer interaction and should generalize to other creative activities (e.g. image manipulation). Resulting advances in active and transfer learning will be of great value to machine learning researchers. Finding the relationships between quantifiable parameters of audio and the language and metaphors used by practicing musicians to describe sound is central to this work. This is of great interest to cognitive scientists, linguists, artificial intelligence researchers, and engineers. Concept maps for audio terms should also prove useful for machine translation. Broad application of techniques to map human descriptive terms on to machine-manipulable parameters will change expectations for both artists and scientists. Artists will be able to explore new lines of creativity that currently require significant investments of time in vastly disparate fields (e.g. signal processing and painting). This has the potential to transform information science and lead to new cognitive models of creativity, forming the basis for new approaches to education and research in both technology and in art.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Prem SeetharamanBryan Pardo "Audealize: Crowdsourced Audio Production Tools" Journal of the Audio Engineering Society , v.64 , 2016 , p.419 http://dx.doi.org/10.17743/jaes.2016.0037

Taylor ZhengPrem SeetharamanBryan Pardo "SocialFX: Studying a Crowdsourced Folksonomy of Audio Effects Terms" Association of Computing Machinery Multimedia Conference, October 15-19, 2016, Amsterdam, Netherlands , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Audio production is essential to many forms of media, including music recordings, podcasts, radio dramas, television programs and film. Audio production tools for effects like reverberation, equalization, and dynamic range compression, are used to process audio after it is recorded, transforming raw recordings into polished final products. These tools are often difficult to use, as they are parameterized and controlled in terms (e.g. spectral tilt, and “Q”) that are non-intuitive to many people. On the other hand, many potential users of audio production tools (e.g. acoustic musicians, podcast creators) have sonic ideas that they cannot express in technical terms. As a result, there is a cognitive gap between the tools and those who would use them.

To bridge this gap we developed a new interaction paradigm for audio production tools that allows users to control audio production tools using evaluative feedback and natural language. For example, the user specifies a word describing the quality of the sound they seek, such as making a recording “brighter” or “warmer.” We created SocialEQ (see image), an audio production tool that lets the user teach the tool the meaning of a sound adjective (e.g “tinny” sound) by presenting alternative manipulations of a sound and letting the user rate how well each manipulation embodies the desired goal (e.g. how “tinny” the sound is now). We created Audealize, a tool that bridges the gap between low-level parameters of existing audio production tools and programmatic goals (e.g. “make my guitar sound ‘underwater”’). Users modify the audio by clicking on the word in a 2-D word map that best describes how they want the sound to change. This example audio production tool (see the primary image) is available for the public to try and download.

We performed the first user study comparing a word map interface to traditional audio production interfaces. A study on a population of 432 non-experts found they favored the word map over traditional interfaces. Absolute performance measures show those using the word map interface produced results that equaled or exceeded results using traditional interfaces. This indicates that language, in concert with a meaningful word map, is an effective interaction paradigm for audio production by non-experts. This points the way for similar interfaces for media production in other domains. For example, one could make a version of Photoshop that lets the user specify in language how they want the image to look.

To develop our word map we required a large vocabulary of descriptive terms with known associations to audio effects created by production tools. Therefore, we performed the first large-scale collection of a vocabulary describing audio effects that maps these words to specific settings of the three most widely used audio effects tools: equalization, reverberation, and dynamic range compression. Data on the strength of association between words and the actual setting of these audio effects was collected for a set of 4297 words drawn from 1233 people. This dataset is two orders of magnitude larger than any previous similar data collection. This data has been made available to the public.

We collected this word association data in both English and Spanish, allowing a new kind of automated translation of descriptive adjectives for audio between these languages. Resources such as the Oxford English Dictionary (OED) typically list the “audio sense” for only a small subset of the words commonly used to describe sound. For example, “warm” is a very commonly used sound adjective and the OED does not mention the audio sense. Directly translating the predominant (i.e. first) sense of a sound adjective into another language often results in an incorrect translation. We developed a system that builds a translation map between sound adjectives of two languages: English and Spanish. When an English word and a Spanish word are both used to describe the same audio effect, they are considered a translation pair. The more frequently a pairing between two words occurs, the more certain the translation. This work points the way to a new kind of machine translation that, rather than relying on paired texts, relies on common associations to the same media object (e.g. a sound file or image).

The audio production tools we developed will facilitate creativity for the general public. Our approach to tool building is generalizable to a variety of activities and disciplines. For example, audiologists could use this approach to translate between lay vocabulary and actionable hearing aid adjustments. Finding the relationships between quantifiable parameters of audio and the language people use to describe sound is at the core of this work. This is of great interest to cognitive scientists, linguists, artificial intelligence researchers, and engineers. Our work also demonstrates techniques that can be used to develop more natural user interfaces, and is of value to human-computer interaction researchers.

Last Modified: 12/05/2016
Modified by: Bryan A Pardo

Images (1 of 2)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error