NSF Award Search: Award # 1911603

Award Abstract # 1911603

Integrating, Disseminating, and Archiving Components of the Shoshoni Language Project

NSF Org:	BCS Division of Behavioral and Cognitive Sciences
Recipient:	UNIVERSITY OF UTAH
Initial Amendment Date:	August 26, 2019
Latest Amendment Date:	December 27, 2023
Award Number:	1911603
Award Instrument:	Standard Grant
Program Manager:	Rachel M. Theodore rtheodor@nsf.gov (703)292-4770 BCS Division of Behavioral and Cognitive Sciences SBE Directorate for Social, Behavioral and Economic Sciences
Start Date:	August 15, 2019
End Date:	January 31, 2025 (Estimated)
Total Intended Award Amount:	$197,424.00
Total Awarded Amount to Date:	$197,424.00
Funds Obligated to Date:	FY 2019 = $197,424.00
History of Investigator:	Marianna Di Paolo (Principal Investigator) dipaolo@anthro.utah.edu
Recipient Sponsored Research Office:	University of Utah 201 PRESIDENTS CIR SALT LAKE CITY UT US 84112-9049 (801)581-6903
Sponsor Congressional District:	01
Primary Place of Performance:	University of Utah 260 S Central Campus, GC4525 Salt Lake City UT US 84112-9199
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	LL8GLEVH6MG3
Parent UEI:
NSF Program(s):	DEL
Primary Program Source:	01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9178, 9251, 1311, 7719
Program Element Code(s):	771900
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.075

ABSTRACT

The Native American Languages Act, passed by the U.S. Congress in 1990, recognizes the unique status and value of Native American languages. Shoshoni [ISO 639-3 shh] is the northernmost member of the Uto-Aztecan language family, languages spoken from Wyoming to Central America. The Shoshoni language today continues to be an important component of Goshute and Shoshone tribal identity. In the 1960's-1970's, the late Wick R. Miller, of the University of Utah, taperecorded speakers of Shoshoni (born from ~1875-1920) from several different varieties, representing the most extensive documentary corpus of any Great Basin language, of vital cultural, historical, and linguistic importance to several tribal communities in the Western states. Past linguistic studies of Shoshoni have largely focused on the internal structure of sentences in isolation and on the structure of words, while this project will focus on its sound system and discourse-level structure. Broader impacts include the availability of the two corpora as free online resources from the Marriott Library (University of Utah) and the California Language Archive (UC-Berkeley). The project will also provide undergraduates from Shoshoni-speaking tribal communities with valuable experience on a computational linguistic research project, and enhance interactions between these young people and the two native-speaker elders collaborating on the project. The team will also produce a print version and an easy-to-read electronic version of a subset of the traditional stories from the Wick R. Miller Collection and disseminate them to the three communities collaborating on the project, the South Fork Band Council of the Te-Moak Tribe, the Confederated Tribes of the Goshute Reservation and the Ely Shoshone Tribe.

While Shoshoni is fairly well-documented for a Native American language, its discourse structure and its phonetics and phonology are relatively understudied. Thus, these significant gaps will be remedied by the development of two corpora. First, the 36 stories will be marked up to produce a electronically-searchable database valuable for sentence-level as well as discourse-level linguistic studies. Second, a phonological and phonetically valuable corpus, consisting of audio-TextGrid pairs of word and sentence-sized recordings which will be force aligned and fine-tuned. In the resulting corpus, the phonemes representing each vowel and consonant will be aligned with the corresponding part of the sound file, allowing researchers to automate the acoustic phonetic analysis of each sound. Such text-to-audio aligned corpora already exist for majority languages such as English, German, Japanese, and Spanish, making their sound systems relatively easy to study and thus leading to the development of electronic products that can quickly process spoken language. These majority language corpora are prepared using costly, language-specific computational tools called forced aligners. Our project will train the Montreal Forced Aligner to align the text of 4,000-5,000 Shoshoni words and short sentences to sound. Doing so will provide a model of how to inexpensively use a generic forced aligner to align text-to-audio data for any small, understudied language. The resulting forced-aligned Shoshoni corpus will greatly speed up the acoustic analysis of this phonologically complex language and lead to many relatively inexpensive, but in-depth, scientifically-sound research studies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error