Award Abstract # 0753321
INTEROP: Lexicon Enhancement via the GOLD Ontology (LEGO)

NSF Org: BCS
Division of Behavioral and Cognitive Sciences
Recipient: EASTERN MICHIGAN UNIVERSITY
Initial Amendment Date: September 2, 2008
Latest Amendment Date: July 27, 2010
Award Number: 0753321
Award Instrument: Continuing Grant
Program Manager: Joan Maling
BCS
 Division of Behavioral and Cognitive Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 1, 2008
End Date: February 28, 2013 (Estimated)
Total Intended Award Amount: $636,443.00
Total Awarded Amount to Date: $636,443.00
Funds Obligated to Date: FY 2008 = $325,000.00
FY 2009 = $125,000.00

FY 2010 = $186,443.00
History of Investigator:
  • Helen Aristar-Dry (Principal Investigator)
    hdry@linguistlist.org
  • Anthony Aristar (Co-Principal Investigator)
  • Jeffrey Good (Co-Principal Investigator)
Recipient Sponsored Research Office: Eastern Michigan University
203 PIERCE HALL
YPSILANTI
MI  US  48197-2264
(734)487-3090
Sponsor Congressional District: 06
Primary Place of Performance: Eastern Michigan University
203 PIERCE HALL
YPSILANTI
MI  US  48197-2264
Primary Place of Performance
Congressional District:
06
Unique Entity Identifier (UEI): STFNT4KCCDU3
Parent UEI:
NSF Program(s): Linguistics,
Robust Intelligence,
DATA INTEROPERABILITY NETWORKS
Primary Program Source: 01000809DB NSF RESEARCH & RELATED ACTIVIT
01000910DB NSF RESEARCH & RELATED ACTIVIT

01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 0000, 7495, 7752, 9110, 9139, OTHR
Program Element Code(s): 131100, 749500, 770100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

This project will furnish several 'building blocks' for data interoperability within linguistics and all disciplines which use language data. The work will be based on the General Ontology for Linguistic Description (GOLD), a machine-readable information structure which allows allows computers to process and 'understand' linguistic concepts and the relations among them. Using GOLD, the project will develop an extensive network of ontology-aware lexical items drawn from sixteen different projects and over 3000 languages. Thus, computers will be able to understand the relationship between linguistic categories across languages, and interpret what their linguistic function is when they appear in texts. In addition, the project will develop a set of low-barrier data requirements which lexicon creators can implement in order to join this ontology-based network. It will also create architecture to integrate network data into frameworks developed by major international standards initiatives. Finally, the project will establish DevSpace, an online facility designed to promote continuing information- and resource-sharing among linguists and developers interested in augmenting the network with additional tools and services.

Such a project is important because cross-linguistic language data is central to many research communities. Language history and language comparison can provide critical insights into the genetics, culture, migrations, and contacts of human populations. And natural language data is indispensable to major computational research initiatives, such as multilingual text processing. In providing linguistically interpreted lexical data from so many underdescribed languages, LEGO will ultimately aid in meaning extraction from texts even of languages far too small to justify a full-scale natural language processing system. Thus from both a computational perspective and a Humanities and Social Sciences perspective, the LEGO project will create a research resource of remarkable breadth and diversity, one which will serve multiple disciplines.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The major goal of this project was to create a sustainable, accessible data network of lexicons of endangered languages, with a multi-lexicon search facility based on the GOLD (General Ontology of Linguistic Description) ontology.  Specifically, the LEGO project had the goal of making available to the public many significant number of lexicons of endangered languages, in a standardized format, with grammatical information mapped to the GOLD ontology, as well as significant number of wordlists of endangered languages, in a standardized XML format.  These languages included Shoshone, Western Pantar, Western Sisaala, Tamashek, Fulfulde, Archi, Potawatomi, Mocovi, Biao Min, Shoshone, Qiang, VerbMobil German, Ibibio, Nhirrpi, Titan, Jarawara, Mbodomo, and Medumba. While most of the material was uploaded by project participants, an uploader allowing a linguist to join the datanet independently by uploading a lexicon and mapping it to GOLD was written.

To make this material usable and accessible, a multi-lexicon/wordlist browsing and search facility was written, supporting search by language, language code, lexical item, gloss, and grammatical information. 

Over the five years of the LEGO project, it made publicly available on the Internet 25 lexicons of endangered languages (4 more are awaiting approval by their authors, and 5 more will be added this summer), 2817 wordlists from understudied languages, supplemented by downloadable schema and stylesheets for converting lexicons into the format required by the LEGO datanet (LL-LIFT).

 


Last Modified: 06/06/2013
Modified by: Helen Aristar-Dry

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page