Award Abstract # 0720122
Collaborative Research: Implementing the GOLD Community of Practice: Laying the Foundations for a Linguistics Cyberinfrastructure

NSF Org: BCS
Division of Behavioral and Cognitive Sciences
Recipient: EASTERN MICHIGAN UNIVERSITY
Initial Amendment Date: August 17, 2007
Latest Amendment Date: July 8, 2008
Award Number: 0720122
Award Instrument: Continuing Grant
Program Manager: Joan Maling
BCS
 Division of Behavioral and Cognitive Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 1, 2007
End Date: August 31, 2011 (Estimated)
Total Intended Award Amount: $87,140.00
Total Awarded Amount to Date: $87,140.00
Funds Obligated to Date: FY 2007 = $42,926.00
FY 2008 = $44,214.00
History of Investigator:
  • Helen Aristar-Dry (Principal Investigator)
    hdry@linguistlist.org
  • Anthony Aristar (Co-Principal Investigator)
Recipient Sponsored Research Office: Eastern Michigan University
203 PIERCE HALL
YPSILANTI
MI  US  48197-2264
(734)487-3090
Sponsor Congressional District: 06
Primary Place of Performance: Eastern Michigan University
203 PIERCE HALL
YPSILANTI
MI  US  48197-2264
Primary Place of Performance
Congressional District:
06
Unique Entity Identifier (UEI): STFNT4KCCDU3
Parent UEI:
NSF Program(s): Linguistics
Primary Program Source: app-0107 
01000809DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 0000, OTHR
Program Element Code(s): 131100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

The empirical component of the linguistics sciences has seen a rapid increase in the amount of data available in digital form. Though there have been recent advances in markup languages, Web protocols, and techniques for data management, linguistics as a whole has not been able to take full advantage of them. For instance, individual sets of linguistic data are often encapsulated in forms that are not compatible with others: linguistic data are not generally interoperable. This is in part because linguistics has only begun to develop field-wide, best-practice resources for managing its data, including common software tools, Web infrastructures, and knowledge components such as ontologies. Such resources would, in fact, act as the backbone for any field-wide cyberinfrastructure effort. Towards such a goal, then, this collaborative project will implement the GOLD Community of Practice, a Web architecture for linking on-line linguistic data to linguistic knowledge captured by the General Ontology for Linguistic Description (GOLD). The component of the project, led by Fei Xia and William D. Lewis, will address the issue of legacy data by harvesting large amounts of interlinear glossed text from the Web. The results will be transformed into a best-practice format and stored in the Online Database of INterlinear text (ODIN). Second, Helen Aristar-Dry and Anthony Aristar will focus on the direct creation of best-practice data by further developing FIELD, a tool that allows field linguists to produce high quality lexical data. Finally, Scott Farrar will instantiate the resulting best-practice data in the GOLD framework, thus integrating data from the first two components. The research team will then demonstrate the efficacy of project by implementing an ontology-driven search facility that incorporates the general knowledge of linguistics with the specific knowledge captured by the data instances. To ensure that the resulting architecture gets wide exposure, we will house the results of this project at the LINGUIST List where it can be seen and evaluated by the linguistics community as a whole.

This project will allow ordinary working linguistics and anyone with an interest in human language to search and see generalizations across large amounts of linguistic data. It will directly address the key issues involved in the comparison and integration data that were not originally intended to be comparable. These include the leveraging of existing resources (i.e., legacy data from the Web), taking advantage of best-practice data standards, and utilizing field-wide knowledge. These issues present significant technological challenges, as there are no general off-the-shelf solutions for given domains such as linguistics. The success of the project requires a deep understanding of linguistic data objects and structures. In fact, the project will demonstrate how the fundamental data structures of the field can be utilized in a broader framework. At a time when the world stands to lose much of its linguistic diversity, this project will result in a community-wide resource usable for its intrinsic value as a search tool to explore the structure of all kinds of human languages. At the present, there are no such search tools available for linguistic data. When users see the value in contributing to such an effort, they will be more likely to embrace the accompanying data standards and tools. Thus, what the project will achieve is a community of linguists dedicated to the production of quality data resources for the common goal of affecting the next great advance in our understanding of the structures of language.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page