
NSF Org: |
BCS Division of Behavioral and Cognitive Sciences |
Recipient: |
|
Initial Amendment Date: | June 18, 2012 |
Latest Amendment Date: | June 18, 2012 |
Award Number: | 1160639 |
Award Instrument: | Standard Grant |
Program Manager: |
Shobhana Chelliah
BCS Division of Behavioral and Cognitive Sciences SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | July 1, 2012 |
End Date: | December 31, 2014 (Estimated) |
Total Intended Award Amount: | $101,501.00 |
Total Awarded Amount to Date: | $101,501.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3451 WALNUT ST STE 440A PHILADELPHIA PA US 19104-6205 (215)898-7293 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3600 MARKET ST STE 810 PHILADELPHIA PA US 19104-2653 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | DEL |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075 |
ABSTRACT
Language Preservation 2.0
The purpose of this pilot project is to demonstrate the feasibility of a new approach to documenting endangered languages.
To allow wide-ranging investigation of a language even after it is no longer spoken, we need the equivalent of the million words of extant biblical Hebrew texts, or the five million words of extant classical Latin. But for endangered languages without a significant culture of literacy, diverse text collections on this scale seem out of reach.
Given typical speaking rates of about 10,000 word-equivalents per hour, a hundred hours of recorded speech -- conversations, narratives, or oral histories -- would give us the equivalent of a million words of text. With community involvement, hundreds of hours of such recordings are easily within reach.
However, transcribing such large audio collections is a daunting task, given the small number of literate native speakers and the time-consuming nature of such transcription, which can take 200 hours of work for every hour of audio. We propose to solve this problem by substituting re-speaking and verbal translation: one or more native speakers repeats each phrase of a recording, speaking slowly and carefully, and then translates it into a better-documented language.
The utility of translated passages as a way to analyze otherwise-unknown languages has been demonstrated many times, starting with the Rosetta Stone. This aspect of our task is easier, since at least a grammatical sketch will in general be available.
Our goal in this project is to demonstrate the utility of re-speaking. We believe that linguists, starting out with relatively little knowledge of a language, can produce phonetic transcriptions that will be good enough to support subsequent analysis resulting in coherent texts, in a process analogous to (but easier than) the process that allowed previous generations of scholars to learn to read ancient Egyptian or Sumerian.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Thousands of the world's languages are not adequately documented, and the languages are falling out of use more rapidly than linguists can record and transcribe them. This project investigated the problem of scaling up the language documentation effort through crowdsourcing, engaging the members of speech communities to record, respeak, and orally translate their linguistic heritage.
The software, Aikuma, is available from aikuma.org, and won the Open Source Software World Challenge Grand Prize 2013. Field tests were conducted in Papua New Guinea, Brazil, and Nepal. Laboratory experiments demonstrated that the audio collected by the phones is of sufficient quality to support later scientific study.
The project established an effective new way to avoid the usual transcription bottleneck which prevents linguists from transcribing more than a few hours of recordings for any language studied. Instead, the method relies on a protocol known as "careful respeaking", in which someone listens to a previously made recording and carefully repeats what was said, phrase by phrase. Aikuma permits the user to start respeaking at any stage during playback and records what was said, aligning it with the original source. Oral translation works in the same way. Accordingly, each source is associated with additional recordings that can be used by future linguists to perform their transcription and translation work, even once no speakers of the language remain.
The app is being used in a variety of ongoing language documentation work, more effectively leveraging the human resources of local speech communities, and contributing significantly to the preservation of endangered languages.
Last Modified: 05/19/2015
Modified by: Steven G Bird