Award Abstract # 0534217
LETRAS: A Learning-based Framework for Machine Translation of Low Resource Languages

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: July 24, 2006
Latest Amendment Date: July 14, 2009
Award Number: 0534217
Award Instrument: Continuing Grant
Program Manager: Tatiana Korelsky
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2006
End Date: July 31, 2011 (Estimated)
Total Intended Award Amount: $0.00
Total Awarded Amount to Date: $763,000.00
Funds Obligated to Date: FY 2006 = $181,000.00
FY 2007 = $260,000.00

FY 2008 = $262,000.00

FY 2009 = $60,000.00
History of Investigator:
  • Jaime Carbonell (Principal Investigator)
    jgc@cs.cmu.edu
  • Alon Lavie (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): ASSP-Arctic Social Science,
HUMAN LANGUAGE & COMMUNICATION,
International Research Collab,
Catalyzing New Intl Collab,
EAPSI,
Robust Intelligence
Primary Program Source: app-0106 
app-0107 

01000809DB NSF RESEARCH & RELATED ACTIVIT

01000910DB NSF RESEARCH & RELATED ACTIVIT

0100CYXXDB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 0000, 5905, 5913, 5926, 5974, 5976, 5977, 7495, 7496, 7715, 9215, 9216, 9218, 9251, HPCC, OTHR
Program Element Code(s): 522100, 727400, 729800, 729900, 731600, 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The LETRAS project investigates novel approaches to development of Machine
Translation (MT) technology, with the goal of establishing a general framework
that supports building MT prototype systems for languages for which only
limited amounts of data and resources in electronic form are available. The
research focus of the project is on automatic learning of translation
transfer-rules from limited amounts of elicited bilingual data. A new run-time
translation "engine" maps source language sentences to their target language
equivalents, by building a large structure of possible partial translations
and then applying effective search techniques for recovering the best
translation. In the last stage, an automatic rule refinement module helps the
system learn how to correct and improve its imperfect translation rules, based
on feedback on translation errors provided by users. MT prototype systems for
several language pairs are being constructed as an integral part of the
project and in collaboration with external research groups. The prototypes
guide our research and test out our new ideas. At the same time, our
collaborations with local researchers and native communities promote the
development of information technology for native languages and educate local
researchers with our state-of-the-art MT research. The prototypes include a
Hebrew-to-English MT system (with University of Haifa, Israel); an
Inupiaq-to-English MT system (with University of Alaska, Fairbanks, and the
Inupiat Heritage Center in Barrow, Alaska); and a Karitiana-to-Portuguese MT
system (with University of Sao Paulo, Brazil). Support for the Alaska
collaboration is being provided by NSF's Office of Polar Programs (OPP), and
support for the collaborations with Israel and Brazil is being provided by
NSF's Office for International Science and Engineering (OISE). OISE is also
providing funding for a planning trip to Bolivia to explore a possible
Aymara-to-Spanish project. The potential long-term impact of the project
is profound - enabling the development of Machine Translation for many
languages of the world, which in turn opens the door for active participation
of native and minority communities in the information-rich activities of the
21st century.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ambati, V. and A. Lavie "Improving Syntax-Driven Translation Models by Re-structuring Divergent and Non-isomorphic Parse Tree Structures" In Proceedings of Student Research Workshop at Conference of the Association for Machine Translation in the Americas (AMTA-2008). , 2008 , p.1
Bills, Aric, Lori S. Levin, Lawrence D. Kaplan, and Edna Ahgeak MacLean. "Finite-state Morphology for Inupiaq." In 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, LREC 2010 , 2010 , p.19
Font Llitjos, A. and S. Vogel "A Walk on the Other Side: Adding Statistical Components to a Transfer-Based Translation System" In Proceedings of Workshop on Syntax and Structure in Statistical Translation (SSST) at HLT-NAACL 2007 , 2007
Font Llitjos, A. and S. Vogel "A Walk on the Other Side: Adding Statistical Components to a Transfer-Based Translation System" In Proceedings of Workshop on Syntax and Structure in Statistical Translation (SSST) at HLT-NAACL 2007 , 2007
Font Llitjos, A., J. Carbonell and A. Lavie "Improving Transfer-Based MT Systems with Automatic Refinements" in Proceedings of MT Summit XI, Copenhagen, Denmark , 2007
Hanneman, G. and A. Lavie "Decoding with Syntactic and Non-Syntactic Phrases in a Syntax-Based Machine Translation System." In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation at the 2009 Meeting of the North-American Chapter of the Association for Computational Linguistics (NAACL-HLT-2009). , 2009 , p.1
Hanneman G, E. Huber, A. Agarwal, V. Ambati, A. Parlikar, E. Peterson, A. Lavie "Statistical Transfer Systems for French-English and German-English Machine Translation" in Proceedings of the Third Workshop on Statistical Machine Translation at ACL-2008 , 2008 , p.163
Hanneman, G., V. Ambati, J. H. Clark, A. Parlikar and A. Lavie. "An Improved Statistical Transfer System for French-English Machine Translation." In Proceedings of the Fourth Workshop on Statistical Machine Translation at the 2009 Meeting of the European Chapter of the Association for Computational Linguistics (EACL-2009). , 2009 , p.1
Lavie, A. "Stat-XFER: A General Search-based Syntax-driven Framework for Machine Translation" Invited paper in Proceedings of CICLing-2008 , v.LNCS 49 , 2008 , p.362
Lavie A., A. Parlikar, V. Ambati "Syntax-Driven Learning of Sub-Sentential Translation Equivalents and Translation Rules from Parsed Parallel Corpora" Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2) , 2008 , p.87

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page