Award Abstract # 1500738
Collaborative Research: Contributions of Endangered Language Data for Advances in Technology-enhanced Speech Annotation

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: SRI INTERNATIONAL
Initial Amendment Date: June 18, 2015
Latest Amendment Date: June 1, 2020
Award Number: 1500738
Award Instrument: Standard Grant
Program Manager: D. Langendoen
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2015
End Date: October 31, 2020 (Estimated)
Total Intended Award Amount: $221,505.00
Total Awarded Amount to Date: $286,358.00
Funds Obligated to Date: FY 2015 = $237,505.00
FY 2017 = $16,000.00

FY 2019 = $32,853.00
History of Investigator:
  • Andreas Kathol (Principal Investigator)
    kathol@speech.sri.com
  • Vikramjit Mitra (Co-Principal Investigator)
Recipient Sponsored Research Office: SRI International
333 RAVENSWOOD AVE
MENLO PARK
CA  US  94025-3493
(609)734-2285
Sponsor Congressional District: 16
Primary Place of Performance: SRI International
333 Ravenswood Ave.
Menlo Park
CA  US  94025-3493
Primary Place of Performance
Congressional District:
16
Unique Entity Identifier (UEI): SRG2J1WS9X63
Parent UEI: SRG2J1WS9X63
NSF Program(s): Robust Intelligence
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1311, 7298, 7495, 7719, 7791, 9179, 9251
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Linguists have increased efforts to collect authentic speech materials from endangered and little-studied languages to discover linguistic diversity. However, the challenge of transcribing these speech into written form to facilitate analysis is daunting. This is because of both the sheer quantity of digitally collected speech that needs to be transcribed and the difficulty of unpacking the sounds of spoken speech.

Linguist Andreas Kathol and computer scientist Vikramjit Mitra of SRI international and linguist Jonathan D. Amith of Gettysburg College will team up to create software that can substantially reduce the language transcription bottleneck. Using as a test case Yoloxochitl Mixtec, an endangered language from the state of Guerrero, Mexico, the team will develop a software tool that will use previously transcribed Yoloxochitl Mixtec speech data to both train a new generation of native speakers in practical orthography and to develop automatic speech recognition software. The output of the recognition software will be used as preliminary transcription that native speakers will correct, as necessary, to create additional high-quality training data. This recursive method will create corpus of transcribed speech large enough so that software will be able to complete automatic transcription of newly collected speech materials.

The project will include the training of undergraduate and graduate students in software development and the analysis of the Yoloxochitl Mixtec sound system. The project will also train native speakers as documenters in an interactive fashion that systematically introduces them to the transcription conventions of their language. This software tool will help in establishing literacy in Yoloxochitl Mixtec among a broader base of speakers.

The results of this project will be available at the Archive of Indigenous Languages of Latin America (University of Texas, Austin), Kaipuleohone (University of Hawai'i Digital Language Archive), and at the Linguistic Data Consortium (University of Pennsylvania).

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

V. Mitra, A. Kathol, J.D. Amith, and R. Castillo Garci?a "Automatic speech recognition for low resource languages: The case of Yoloxo?chitl Mixtec" Interspeech 2016 , 2016 , p.3076-3080
V. Mitra, A. Kathol, J.D. Amith, R. Castillo Garcia "Automatic Speech Transcription for Low-Resource Languages ? The Case of Yoloxóchitl Mixtec (Mexico)" peer reviewed Conference Paper - Interspeech 2016 , 2016 , p.3076-3080

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The documentation of endangered languages has become an increasing priority for researchers and those native speaker communities that wish to maintain and revitalize their language and culture. Estimates vary of the number of extant languages, the rate of language disappearance and cultural loss, and even the meaning of cultural loss and language death (versus change and shift, as when an Indigenous language is increasingly impacted by colonial languages and cultures). It appears certain that the majority of approximately 6,000-7,000 languages presently spoken will effectively disappear within this century. With this, the diversity of linguistic expression is diminished and the vast reservoir of human knowledge is impoverished.

Yet our ability to document endangered languages and cultures is inhibited by a "transcription bottleneck": hundreds of hours of high quality recordings can be produced in a short amount of time but take months to process. For example, in one week in the village of Yoloxochitl (Guerrero, Mexico), Jonathan Amith, a project co-PI recorded 30 hours of conversations between a dozen native speakers. An excellent native speaker linguist, Rey Castillo Garcia, would require close to a year to produce an archival quality transcription and translation of this material. To address this issue researchers have looked to automatic speech recognition (ASR), the computer-generated production of a written transcription of an audio recording. This project has explored the possibility of using ASR to achieve a high level of accuracy in the transcription of Yoloxochitl Mixtec (YM), an endangered tonal language from west-central Mexico.

The project's primary goal has been to address the "transcription bottleneck" mentioned above. This was accomplished in part by developing, under the direction of co-PI Andreas Kathol, a YM speech recognizer based on recent advances in ASR. First versions of this recognizer at the beginning of the project resulted in a word error rate (WER) of 30.1% for a held-out testset.  By project end the WER had been reduced to 19%.

A secondary set of goal has been to (1) TRAIN native speakers to write their language through a progression of increasingly challenging transcription tasks; (2) TEST native speakers on their success in learning as they progressed through increasingly difficult lessons; and (3) allow users to ANNOTATE, or correct, a transcription proposed by the ASR algorithm. To accomplish all these tasks a TTA (Training, Testing, Annotation) tool was created at SRI International and installed on a Chromebook laptop. The Training and Testing was evaluated with the participation of a native speaker, Esteban Guadalupe Sierra, who finished all lessons and then began to transcribe directly from audio. He has reached a level that is 97% in accord with the transcription of Rey Castillo Garcia, the expert native speaker linguist.

A final goal was to implement ASR (along with the TTA tool) on Chromebook computers to be used in Yoloxochitl. Amith along with Castillo and Guadalupe would work with native speakers to obtain audio recordings. The speech would then be processed locally through the ASR system on the Chromebook and the transcription would be loaded into the TAA tool to be used in Annotation mode. Then the original speaker would correct the computer-generated transcription as best as possible. Correcting mistakes in tone, nasalization, or glottal stops is still challenging for all but the best transcribers. However, speakers can be expected to reliably eliminate words that ASR has mistakenly inserted or add words that the ASR program has failed to transcribe. The transcription so corrected by the narrator would then be given to Castillo to be finalized.

We fully expect recognition accuracy to improve further with further advances in ASR technology. But even with the ASR accuracy achieved so far this project has shown the utility of an integrated approach to Training and Testing native speakers, recording and processing ASR locally, and then having the original speaker take the first step in reviewing his or her computer-generated transcription.





Last Modified: 02/26/2021
Modified by: Andreas Kathol

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page