Award Abstract # 0749062
SGER: Automatic Processing of Natural Language Code Switching

NSF Org: BCS
Division of Behavioral and Cognitive Sciences
Recipient: THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK
Initial Amendment Date: August 27, 2007
Latest Amendment Date: August 27, 2008
Award Number: 0749062
Award Instrument: Standard Grant
Program Manager: Eric H. Potsdam
BCS
 Division of Behavioral and Cognitive Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 1, 2007
End Date: February 28, 2009 (Estimated)
Total Intended Award Amount: $0.00
Total Awarded Amount to Date: $40,467.00
Funds Obligated to Date: FY 2007 = $40,467.00
History of Investigator:
  • Mona Diab (Principal Investigator)
    mdiab@andrew.cmu.edu
  • Owen Rambow (Co-Principal Investigator)
  • Nizar Habash (Co-Principal Investigator)
Recipient Sponsored Research Office: Columbia University
615 W 131ST ST
NEW YORK
NY  US  10027-7922
(212)854-6851
Sponsor Congressional District: 13
Primary Place of Performance: Columbia University
615 W 131ST ST
NEW YORK
NY  US  10027-7922
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): F4N1QNPB95M4
Parent UEI:
NSF Program(s): Linguistics,
Robust Intelligence
Primary Program Source: app-0107 
Program Reference Code(s): 0000, 7495, 9237, OTHR
Program Element Code(s): 131100, 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

Code switching is a natural linguistic phenomenon in which a speaker mixes two or more languages or dialects, or two or more linguistic registers from the same language. Extensive sociolinguistic studies have been dedicated to this widespread and common phenomenon and there has been some prior work in formal linguistics, but to date it has not been considered a problem of interest to the computational linguistics community. However, in this age of globalization and the current explosion in information and web access, more and more spontaneously generated linguistic data from around the world are being made available to the computational research community. Such data abounds with code switching in its different forms, so there is a real need for computational linguists to address code switching as a central research problem.

This exploratory research effort addresses the issues of how to process code switching automatically. It examines the different aspects of code switching, allowing for the creation of better-principled algorithms based on a clear understanding of the phenomenon. The main questions revolve around morphological and syntactic constraints on switching and how these constraints can be modeled computationally. One of the outcomes of this research is the annotation of significant amounts of data exhibiting code switching in different languages, most likely Arabic, Hindi and Spanish. This research aims at initiating a formal study of code switching in a computational framework, which both increases our understanding of the phenomenon, and develops algorithms for processing natural language data that manifests code switching.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page