
NSF Org: |
BCS Division of Behavioral and Cognitive Sciences |
Recipient: |
|
Initial Amendment Date: | September 13, 2022 |
Latest Amendment Date: | September 13, 2022 |
Award Number: | 2243445 |
Award Instrument: | Continuing Grant |
Program Manager: |
Jorge Valdes Kroff
jvaldesk@nsf.gov (703)292-7920 BCS Division of Behavioral and Cognitive Sciences SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | September 15, 2022 |
End Date: | January 31, 2024 (Estimated) |
Total Intended Award Amount: | $304,441.00 |
Total Awarded Amount to Date: | $104,311.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2145 N TANANA LOOP FAIRBANKS AK US 99775-0001 (907)474-7301 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2145 N. TANANA LOOP FAIRBANKS AK US 99775-7880 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ASSP-Arctic Social Science |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075, 47.078 |
ABSTRACT
One locus of crosslinguistic variation in how languages build words is whether meaning is encoded in free morphemes ('units of meaning') that stand alone as words, or whether those morphemes must combine with other morphemes to become words. While English has many free morphemes, the Alaska Native language St. Lawrence Island/Siberian Yupik uses the second strategy with very complex words, often sentence-sized. These properties are known as agglutination and polysynthesis. Researchers will document critical structures in the language, digitize existing Yupik materials, and build computational tools to help the community and other researchers. The data from Yupik are extremely important to language science, since many of the phenomena displayed in the language are rare and not well understood. Creating computational tools for languages with very complex words, like Yupik, is of additional benefit to computer scientists and language scientists in that it helps researchers improve computational tools for languages like English. The Native American Languages Act, passed by the U.S. Congress in 1990, enacted into policy the recognition of the unique status and importance of Native American languages. This project will build and improve tools like a morphological analyzer, a spellchecker, and a searchable dictionary, of value to the community in revitalizing their language. Graduate students will be trained in these methods, and researchers will hold outreach meetings with high school students in the language community to teach them important computer and coding skills that will enable them to build further tools. All data gathered will be permanently archived at the Alaska Native Language Archive.
The investigators, a collaboration of language and computer scientists from the University of Illinois at Urbana-Champaign and George Mason University, will undertake this project. It involves three interconnected parts: digitization of existing materials on and in Yupik for use by community members and researchers; recording and analyzing the speech of Yupik speakers; and working with the community to build computer tools for Yupik and teaching students how to do so. A successful computational model of Yupik linguistic phenomena has implications for unsupervised and semi-supervised methods in morphology induction and grammar induction because the types of morphophonological change are pervasive, much more so than models used in other approaches to unsupervised morphology induction. This work is likely to have important implications regarding appropriate computational modeling of polysynthetic agglutinative morphosyntax. Accessing materials at several archives, the team will scan them, and clean and process the scans so they are accessible digitally and searchable. This will create a digital corpus of Yupik materials for use by the community and for linguistic investigations into grammatical mood, tense, and aspect to better understand these complex morphosemantic constructions. The data will also improve the computational tools being developed in this project, providing the Yupik community with access to modern tools like spellcheckers, electronically searchable dictionaries, and electronic books. Finally, in its tight integration of field work and the development of computational tools for the analysis of the language, this project will serve as a model for future collaborations of this kind.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Overview
This project was a comprehensive, integrated, multi-disciplinary approach to Yupik language documentation.
The project had three major components:
- Digitization and dissemination of existing Yupik-language materials from the 1970s, 1980s, and 1990s
- Development of computational tools for Yupik
- Documentation of Yupik derivational morphology
This project was a collaborative effort by Dr. Lane Schwartz and his graduate research assistant Emily Chen, in conjunction with Dr. Sylvia Schreiner and her graduate research assistant Ben Hunt, with additional work by graduate research assistant Hayley Park.
This work began at the University of Illinois at Urbana-Champaign, and was concluded at the University of Alaska Fairbanks.
Digitization and dissemination
During this project, the following Yupik language materials from the 1970s, 1980s, and 1990s were scanned, digitized, and distributed in electronic form to members of the St. Lawrence Island community.
- 12 pre-K and early elementary readers
- 4 mid-elementary readers
- 3-volume Lore of St. Lawrence Island books
- Existing Yupik grammars and dictionaries
- Over 50 additional elementary-level and middle-school level books and workbooks
Development of computational tools for Yupik
During this project, the following computational tools were developed for Yupik
- Basic rule-based spell-checker
- Finite-state morphological analyzer (adapted from the existing Yupik grammar by Steve Jacobson, 2001)
- Online dictionary (adapted from the existing Yupik dictionary by Badten et al, 2008)
Language documentation
During this project, the following language documentation was developed:
- Doctoral dissertation on Yupik derivational morphology by graduate research assistant Emily Chen
- Corpus of Yupik sentences with interlinear glosses, compiled by graduate research assistant Emily Chen
- Corpus of dependecy parsed Yupik sentences, compiled by graduate research assistant Hayley Park
Last Modified: 05/08/2024
Modified by: Lane Schwartz
Please report errors in award information by writing to: awardsearch@nsf.gov.