
NSF Org: |
BCS Division of Behavioral and Cognitive Sciences |
Recipient: |
|
Initial Amendment Date: | September 2, 2008 |
Latest Amendment Date: | July 27, 2010 |
Award Number: | 0753321 |
Award Instrument: | Continuing Grant |
Program Manager: |
Joan Maling
BCS Division of Behavioral and Cognitive Sciences SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | September 1, 2008 |
End Date: | February 28, 2013 (Estimated) |
Total Intended Award Amount: | $636,443.00 |
Total Awarded Amount to Date: | $636,443.00 |
Funds Obligated to Date: |
FY 2009 = $125,000.00 FY 2010 = $186,443.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
203 PIERCE HALL YPSILANTI MI US 48197-2264 (734)487-3090 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
203 PIERCE HALL YPSILANTI MI US 48197-2264 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Linguistics, Robust Intelligence, DATA INTEROPERABILITY NETWORKS |
Primary Program Source: |
01000910DB NSF RESEARCH & RELATED ACTIVIT 01001011DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075 |
ABSTRACT
This project will furnish several 'building blocks' for data interoperability within linguistics and all disciplines which use language data. The work will be based on the General Ontology for Linguistic Description (GOLD), a machine-readable information structure which allows allows computers to process and 'understand' linguistic concepts and the relations among them. Using GOLD, the project will develop an extensive network of ontology-aware lexical items drawn from sixteen different projects and over 3000 languages. Thus, computers will be able to understand the relationship between linguistic categories across languages, and interpret what their linguistic function is when they appear in texts. In addition, the project will develop a set of low-barrier data requirements which lexicon creators can implement in order to join this ontology-based network. It will also create architecture to integrate network data into frameworks developed by major international standards initiatives. Finally, the project will establish DevSpace, an online facility designed to promote continuing information- and resource-sharing among linguists and developers interested in augmenting the network with additional tools and services.
Such a project is important because cross-linguistic language data is central to many research communities. Language history and language comparison can provide critical insights into the genetics, culture, migrations, and contacts of human populations. And natural language data is indispensable to major computational research initiatives, such as multilingual text processing. In providing linguistically interpreted lexical data from so many underdescribed languages, LEGO will ultimately aid in meaning extraction from texts even of languages far too small to justify a full-scale natural language processing system. Thus from both a computational perspective and a Humanities and Social Sciences perspective, the LEGO project will create a research resource of remarkable breadth and diversity, one which will serve multiple disciplines.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The major goal of this project was to create a sustainable, accessible data network of lexicons of endangered languages, with a multi-lexicon search facility based on the GOLD (General Ontology of Linguistic Description) ontology. Specifically, the LEGO project had the goal of making available to the public many significant number of lexicons of endangered languages, in a standardized format, with grammatical information mapped to the GOLD ontology, as well as significant number of wordlists of endangered languages, in a standardized XML format. These languages included Shoshone, Western Pantar, Western Sisaala, Tamashek, Fulfulde, Archi, Potawatomi, Mocovi, Biao Min, Shoshone, Qiang, VerbMobil German, Ibibio, Nhirrpi, Titan, Jarawara, Mbodomo, and Medumba. While most of the material was uploaded by project participants, an uploader allowing a linguist to join the datanet independently by uploading a lexicon and mapping it to GOLD was written.
To make this material usable and accessible, a multi-lexicon/wordlist browsing and search facility was written, supporting search by language, language code, lexical item, gloss, and grammatical information.
Over the five years of the LEGO project, it made publicly available on the Internet 25 lexicons of endangered languages (4 more are awaiting approval by their authors, and 5 more will be added this summer), 2817 wordlists from understudied languages, supplemented by downloadable schema and stylesheets for converting lexicons into the format required by the LEGO datanet (LL-LIFT).
Last Modified: 06/06/2013
Modified by: Helen Aristar-Dry
Please report errors in award information by writing to: awardsearch@nsf.gov.