Skip to feedback

Award Abstract # 2243445
NNA: Collaborative Research: Integrating Language Documentation and Computational Tools for Yupik, an Alaska Native Language

NSF Org: BCS
Division of Behavioral and Cognitive Sciences
Recipient: UNIVERSITY OF ALASKA FAIRBANKS
Initial Amendment Date: September 13, 2022
Latest Amendment Date: September 13, 2022
Award Number: 2243445
Award Instrument: Continuing Grant
Program Manager: Jorge Valdes Kroff
jvaldesk@nsf.gov
 (703)292-7920
BCS
 Division of Behavioral and Cognitive Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 15, 2022
End Date: January 31, 2024 (Estimated)
Total Intended Award Amount: $304,441.00
Total Awarded Amount to Date: $104,311.00
Funds Obligated to Date: FY 2018 = $104,311.00
History of Investigator:
  • Lane Schwartz (Principal Investigator)
    lane.schwartz@alaska.edu
Recipient Sponsored Research Office: University of Alaska Fairbanks Campus
2145 N TANANA LOOP
FAIRBANKS
AK  US  99775-0001
(907)474-7301
Sponsor Congressional District: 00
Primary Place of Performance: University of Alaska Fairbanks Campus
2145 N. TANANA LOOP
FAIRBANKS
AK  US  99775-7880
Primary Place of Performance
Congressional District:
00
Unique Entity Identifier (UEI): FDLEQSJ8FF63
Parent UEI:
NSF Program(s): ASSP-Arctic Social Science
Primary Program Source: 0100XXXXDB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 072Z, 1311, 7719, 9179
Program Element Code(s): 522100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075, 47.078

ABSTRACT

One locus of crosslinguistic variation in how languages build words is whether meaning is encoded in free morphemes ('units of meaning') that stand alone as words, or whether those morphemes must combine with other morphemes to become words. While English has many free morphemes, the Alaska Native language St. Lawrence Island/Siberian Yupik uses the second strategy with very complex words, often sentence-sized. These properties are known as agglutination and polysynthesis. Researchers will document critical structures in the language, digitize existing Yupik materials, and build computational tools to help the community and other researchers. The data from Yupik are extremely important to language science, since many of the phenomena displayed in the language are rare and not well understood. Creating computational tools for languages with very complex words, like Yupik, is of additional benefit to computer scientists and language scientists in that it helps researchers improve computational tools for languages like English. The Native American Languages Act, passed by the U.S. Congress in 1990, enacted into policy the recognition of the unique status and importance of Native American languages. This project will build and improve tools like a morphological analyzer, a spellchecker, and a searchable dictionary, of value to the community in revitalizing their language. Graduate students will be trained in these methods, and researchers will hold outreach meetings with high school students in the language community to teach them important computer and coding skills that will enable them to build further tools. All data gathered will be permanently archived at the Alaska Native Language Archive.

The investigators, a collaboration of language and computer scientists from the University of Illinois at Urbana-Champaign and George Mason University, will undertake this project. It involves three interconnected parts: digitization of existing materials on and in Yupik for use by community members and researchers; recording and analyzing the speech of Yupik speakers; and working with the community to build computer tools for Yupik and teaching students how to do so. A successful computational model of Yupik linguistic phenomena has implications for unsupervised and semi-supervised methods in morphology induction and grammar induction because the types of morphophonological change are pervasive, much more so than models used in other approaches to unsupervised morphology induction. This work is likely to have important implications regarding appropriate computational modeling of polysynthetic agglutinative morphosyntax. Accessing materials at several archives, the team will scan them, and clean and process the scans so they are accessible digitally and searchable. This will create a digital corpus of Yupik materials for use by the community and for linguistic investigations into grammatical mood, tense, and aspect to better understand these complex morphosemantic constructions. The data will also improve the computational tools being developed in this project, providing the Yupik community with access to modern tools like spellcheckers, electronically searchable dictionaries, and electronic books. Finally, in its tight integration of field work and the development of computational tools for the analysis of the language, this project will serve as a model for future collaborations of this kind.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 16)
Chen, Emily and Park, Hyunji Hayley and Schwartz, Lane "Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology" LREC proceedings , 2020 Citation Details
Chen, Emily and Schwartz, Lane "A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik" LREC proceedings , 2018 Citation Details
Hunt, Benjamin and Chen, Emily and Schreiner, Sylvia L.R. and Schwartz, Lane "Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik" Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) , 2019 https://doi.org/10.18653/v1/N19-4021 Citation Details
Koonooka, Christopher Petuwaq and Schreiner, Sylvia L.R. and Soldati, Giulia Masella and Schwartz, Lane and Hunt, Benjamin and Haas, Preston and Chen, Emily and Park, Hyunji Hayley "Akuzipik/Yupik (St. Lawrence Island, Alaska, USA; Chukotka, Russia) - Language Snapshot" Language documentation and description , v.20 , 2021 https://doi.org/10.25895/ldd43 Citation Details
Park, Hyunji and Schwartz, Lane and Tyers, Francis "Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik" Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas , 2021 https://doi.org/10.18653/v1/2021.americasnlp-1.14 Citation Details
Park, Hyunji Hayley and Zhang, Katherine J. and Haley, Coleman and Steimel, Kenneth and Liu, Han and Schwartz, Lane "Morphology Matters: A Multilingual Language Modeling Analysis" Transactions of the Association for Computational Linguistics , v.9 , 2021 https://doi.org/10.1162/tacl_a_00365 Citation Details
Schreiner, Sylvia L.R. and Schwartz, Lane and Hunt, Benjamin and Chen, Emily "Multidirectional leveraging for computational morphology and language documentation and revitalization" Language documentation and conservation , v.14 , 2020 Citation Details
Schwartz, Lane "Language Shift, Language Technology, and Language Revitalization: Challenges and Possibilities for St. Lawrence Island Yupik" Proceedings of the International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide , 2019 Citation Details
Schwartz, Lane "Primum Non Nocere: Before working with Indigenous data, the ACL must confront ongoing colonialism" Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics , v.2 , 2022 https://doi.org/10.18653/v1/2022.acl-short.82 Citation Details
Schwartz, Lane and Chen, Emily "Liinnaqumalghiit: A Web-based Tool for Addressing Orthographic Transparency in St. Lawrence Island / Central Siberian Yupik." Language documentation and conservation , v.11 , 2017 https://doi.org/http://hdl.handle.net/10125/24736 Citation Details
Schwartz, Lane and Chen, Emily and Hunt, Benjamin and Schreiner, Sylvia L.R. "Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer" 3rd Workshop on Computational Methods for Endangered Languages , v.1 , 2019 Citation Details
(Showing: 1 - 10 of 16)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Overview

This project was a comprehensive, integrated, multi-disciplinary approach to Yupik language documentation. 

The project had three major components: 

  • Digitization and dissemination of existing Yupik-language materials from the 1970s, 1980s, and 1990s
  • Development of computational tools for Yupik
  • Documentation of Yupik derivational morphology

This project was a collaborative effort by Dr. Lane Schwartz and his graduate research assistant Emily Chen, in conjunction with Dr. Sylvia Schreiner and her graduate research assistant Ben Hunt, with additional work by graduate research assistant Hayley Park.

This work began at the University of Illinois at Urbana-Champaign, and was concluded at the University of Alaska Fairbanks.

 

Digitization and dissemination

During this project, the following Yupik language materials from the 1970s, 1980s, and 1990s were scanned, digitized, and distributed in electronic form to members of the St. Lawrence Island community.

  • 12 pre-K and early elementary readers
  • 4 mid-elementary readers
  • 3-volume Lore of St. Lawrence Island books
  • Existing Yupik grammars and dictionaries
  • Over 50 additional elementary-level and middle-school level books and workbooks

 

Development of computational tools for Yupik

During this project, the following computational tools were developed for Yupik

  • Basic rule-based spell-checker
  • Finite-state morphological analyzer (adapted from the existing Yupik grammar by Steve Jacobson, 2001)
  • Online dictionary (adapted from the existing Yupik dictionary by Badten et al, 2008)

 

Language documentation

During this project, the following language documentation was developed:

  • Doctoral dissertation on Yupik derivational morphology by graduate research assistant Emily Chen
  • Corpus of Yupik sentences with interlinear glosses, compiled by graduate research assistant Emily Chen
  • Corpus of dependecy parsed Yupik sentences, compiled by graduate research assistant Hayley Park

Last Modified: 05/08/2024
Modified by: Lane Schwartz

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page