Skip to feedback

Award Abstract # 1614291
EAPSI: Audio Attendant: A User Interface for Learning Peripheral Sounds

NSF Org: OISE
Office of International Science and Engineering
Recipient:
Initial Amendment Date: July 26, 2016
Latest Amendment Date: July 26, 2016
Award Number: 1614291
Award Instrument: Fellowship Award
Program Manager: Anne Emig
OISE
 Office of International Science and Engineering
O/D
 Office Of The Director
Start Date: June 15, 2016
End Date: May 31, 2017 (Estimated)
Total Intended Award Amount: $5,400.00
Total Awarded Amount to Date: $5,400.00
Funds Obligated to Date: FY 2016 = $5,400.00
History of Investigator:
  • Kristin Williams (Principal Investigator)
Recipient Sponsored Research Office: Williams Kristin
Pittsburgh
PA  US  15217-1442
Sponsor Congressional District: 12
Primary Place of Performance: HCI & Design Lab, Seoul National University
1 Gwanak-ro, Gwanak-Gu, Se
 KS  151-7-46
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI):
Parent UEI:
NSF Program(s): EAPSI
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 5942, 5978, 7316
Program Element Code(s): 731600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.079

ABSTRACT

The Korean orthography makes clear the relationship between written and spoken language to facilitate phonological decoding of its written form. When a person learns a language, they learn to represent the sounds of the language in meaningful categories. Insofar as dyslexia is a struggle with mapping sounds to their written form, the expression of dyslexia in the Korean language is likely to differ significantly from other languages such as English or Spanish. This project proposes to design and create a user interface accessible in a mobile context that facilitates phonological awareness in the Korean language. The research will be performed in collaboration with Dr. Joonhwan Lee of the Human Computer Interaction and Design Lab at Seoul National University in Korea. The project will inform development of a lightweight dyslexia screener that could be deployed by nonexperts (for free or minimal cost) from around the world to support access to needed resources for a person with dyslexia. The work will advance research areas as language-based assistive technology, human-computer interaction for individuals with cognitive disabilities, and crowdsourcing and will contribute to identifying individuals with dyslexia irrespective of their native language to help make the web accessible to persons with language impairments from around the world.

Prior work highlights a role for a system that can help a person implicitly learn sound categories that are specific to a target language, though may not be at the center of a person?s attention or part of their lexicon. This project will investigate how auditory cues can be integrated in a contextually appropriate manner when balancing cues with attending to conversation. Building on Edge, et al., we will extend contextual vocabulary learning of a target language to contextual learning of sound categories. We will determine what sound categories should be acquired by adapting a scoring system of importance that has been found to work well for dynamic visual displays when driving, to an auditory attention service that reflects the learner?s goals and contextual importance. Finally, we will design and create a user interface for learning contextual audio categories based off this prior work to be used in a mobile setting during conversation.

This award under the East Asia and Pacific Summer Institutes program supports summer research by a U.S. graduate student and is jointly funded by NSF and the National Research Foundation of Korea.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Speaking and writing communicate in fundamentally different ways. Yet, current, speech-to-text transformation renders spoken language into a written code of the words irrespective of the verbal context or speaker intonation. For a person who relies solely on the text to interpret what is said, this process omits important cues to the speaker’s meaning. A mobile chat service that actively makes use of audio processing of spoken language could address these limitations by using the visual features of chat to draw attention to features of spoken language. To design such a service, we modified a kinetic typography engine to incorporate features of spoken mood and energy into a typographic library.

We examined whether acoustic information that is peripheral to spoken content—like vocal signatures of emotion—can be encoded in kinetic typography and recognized by second language learners. Recognition of these peripheral cues could facilitate language learners’ development of sociolinguistic competence. This competence consists in the ability to know when it is contextually appropriate to use a language’s vocabulary and captures a sense of socio-political knowledge that is tacitly learned when immersed in a language’s spoken environment. Sociolinguistic sensitivity characterizes the speaker’s ability to actively listen and know both when and how to participate in conversation. The Korean orthography does not directly convey sociolinguistic aspects that are implicitly expressed by the speaker. Yet, these cues have important meaning in Korean culture: like when it is appropriate to use honorifics, internet speak, and gendered forms of communication. We modified a kinetic typography to support learning these peripheral cues embedded in spoken language.

INTELLECTUAL MERIT AND BROADER IMPACTS

Our work contributes a kinetic typography library that can interface with the web browser audio API and is compatible with speech recognition algorithms to facilitate peripheral attention to desired categories of spoken language. We created a dataset consisting of 18 sentences spoken by male and female voice actors with 9 different emotions. We also created a script which extracts audio signatures of arousal and valence based on the circumplex model of emotion and use these as parameters to be passed to a kinetic typography library. This dataset and script allows a kinetic typography library used in a web browser to render the animated type alongside the audio file and draw visual attention to the emotional cues in the spoken audio. This dataset and script contribute an approach for rendering kinetic typography alongside spoken language which could be used in both the Korean and English languages and could support individuals with reading disabilities, second language learners, or the deaf or hard of hearing. These techniques could be readily integrated in web chat services, captioning online video-streaming content, or other language based assistive technology.


Last Modified: 03/15/2017
Modified by: Kristin Williams

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page